A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm
Tài liệu tham khảo
Aghdam, 2009, Text feature selection using ant colony optimization, Expert Systems with Applications, 36, 6843, 10.1016/j.eswa.2008.08.022
AlZamil, 2011, ROLEX-SP: rules of lexical syntactic patterns for free text categorization, Knowledge-Based Systems, 24, 58, 10.1016/j.knosys.2010.07.005
Baeza-Yates, 1999
Chang, 2008, Multilabel text categorization based on a new linear classifier learning method and a category-sensitive refinement method, Expert Systems with Applications, 34, 1948, 10.1016/j.eswa.2007.02.037
Cover, 1967, Nearest neighbor pattern classification, IEEE Transactions on Information Theory, 13, 21, 10.1109/TIT.1967.1053964
Damerau, 2004, Text categorization for a comprehensive time-dependent benchmark, Information Processing and Management, 40, 209, 10.1016/S0306-4573(03)00006-2
ElAlami, 2009, A filter model for feature subset selection based on genetic algorithm, Knowledge-Based Systems, 22, 356, 10.1016/j.knosys.2009.02.006
Ferr, 1995, Selection of components in principal component analysis: a comparison of methods, Computing and Statistical Data Analysis, 19, 669, 10.1016/0167-9473(94)00020-J
Fuhr, 1991, A probabilistic learning approach for document indexing, ACM Transactions on Information Systems, 9, 223, 10.1145/125187.125189
Gen, 2000, vol. 68
Goldberg, 1989
Holland, 1975
T. Joachims, A probabilistic analysis of the rocchio algorithm with TFIDF for text categorization, in: Proceedings of Fourteenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco, 1997, pp. 143–151.
T. Joachims, Text categorization with support vector machines: learning with many relevant features, in: Proceedings of the 10th European Conference on Machine Learning, pp. 137–142.
Jolliffe, 1986
S.L.Y. Lam, D.L. Lee, Feature reduction for neural network based text categorization, in: Sixth International Conference on Database Systems for Advanced Applications (DASFAA’99), 1999, p. 195.
Lam, 2003, Automatic textual document categorization based on generalized instance sets and a metamodel, Proceeding of the IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 628, 10.1109/TPAMI.2003.1195997
D.D. Lewis, Reuters-21578 text categorization test collection, distribution 1.0. <http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html>, 1997.
Li, 2009, Combination of modified BPNN algorithms and an efficient feature selection method for text categorization, Information Processing and Management, 45, 329, 10.1016/j.ipm.2008.09.004
Y. Li, D.F. Hsu, S.M. Chung, Combining multiple feature selection methods for text categorization by using rank-score characteristics, in: 21st IEEE International Conference on Tools with Artificial Intelligence, 2009, pp. 508–517.
Liu, 2007, Combined mining of Web server logs and web contents for classifying user navigation patterns and predicting users’ future requests, Data and Knowledge Engineering, 61, 304, 10.1016/j.datak.2006.06.001
L. Liu, J. Kang, J. Yu, Z. Wang, A comparative study on unsupervised feature selection methods for text clustering, in: Proceeding of NLP-KE’O5, 2005, pp. 597–601.
A. McCallum, K. Nigam, A comparison of event models for Naive Bayes text classification, in: AAAI’98 Workshop on Learning for Text Categorization, 1998, pp. 41–48.
Mitchel, 1997
Mitra, 2007, Text classification: a least square support vector machine approach, Applied Soft Computing, 7, 908, 10.1016/j.asoc.2006.04.002
Porter, 1980, An algorithm for suffix stripping, Program (Automated Library and Information Systems), 14, 130, 10.1108/eb046814
Quinlan, 1986, Induction of decision trees, Machine Learning, 1, 81, 10.1007/BF00116251
Salton, 1988, Term-weighting approaches in automatic text retrieval, Information Processing and Management, 24, 513, 10.1016/0306-4573(88)90021-0
F. Sebastiani, A tutorial on automated text categorisation, in: Proceedings of the ASAI-99, in: 1st Argentinian Symposium on Artificial Intelligence, Buenos Aires, AR., 1999, pp. 17–35.
Selamat, 2004, Web page feature selection and classification using neural Networks, Information Sciences, 158, 69, 10.1016/j.ins.2003.03.003
N. Slonim, N. Tishby, Document clustering using word clusters via the information bottleneck method, in: Proceedings of SJGIR’00, 2000, pp. 208–215.
Song, 2009, Genetic algorithm for text clustering based on latent semantic indexing, Computers and Mathematics with Applications, 57, 1901, 10.1016/j.camwa.2008.10.010
J.-T. Sun, Z. Chen, H.-J. Zeng, Y. Lu, C.-Y. Shi, W.-Y. Ma, Supervised latent semantic indexing for document categorization, in: ICDM, IEEE Press, 2004, pp. 535–538.
Tan, 2006, An effective refinement strategy for KNN text classifier expert, Systems with Applications, 30, 290, 10.1016/j.eswa.2005.07.019
Valle, 1999, Selection of the number of principal components: the variance of the reconstruction error criterion with a comparison to other methods, Ind, Engineering Chemistry Research, 38, 4389, 10.1021/ie990110i
Van Rijsbergen, 1979
Warne, 2004, Statistical and computational intelligence techniques for inferential model development: a comparative evaluation and a novel proposition for fusion, Engineering Applications of Artificial Intelligence, 17, 871, 10.1016/j.engappai.2004.08.020
Wyse, 1980, A critical evaluation of intrinsic dimensionality algorithms, Pattern Recognition in Practice, 415
Yang, 1997, An evaluation of statistical approaches to text categorization, Information Retrieval, 1, 76
Y. Yang, J.O. Pedersen, A comparative study on feature selection in text categorization, in: Proceedings of the 14th International Conference on Machine Learning, 1997, pp. 412–420.
Yu, 2008, Latent semantic analysis for text categorization using neural network, Knowledge-Based Systems, 21, 900, 10.1016/j.knosys.2008.03.045
Zhang, 2006, Multilabel neural networks with applications to functional genomics and text categorization, IEEE Transactions on Knowledge and Data Engineering, 18, 1338, 10.1109/TKDE.2006.162
Zhang, 2007, Artificial neural networks based on principal component analysis input selection for clinical pattern recognition analysis, Talanta, 73, 68, 10.1016/j.talanta.2007.02.030
Zhang, 2008, Text classification based on multi-word with support vector machine, Knowledge-Based Systems, 21, 879, 10.1016/j.knosys.2008.03.044
W. Zhao, Y. Wang, D. Li, A dynamic feature selection method based on combination of GA with K-means, in: 2nd International Conference on Industrial Mechatronics and Automation, 2010, pp. 271–274.
C. Zifeng, X. Baowen, Z. Weifeng, J. Dawei, X. Junling, CLDA: feature selection for text categorization based on constrained LDA, in: International Conference on Semantic Computing (ICSC 2007), 2007, pp. 702–712.