A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm

Knowledge-Based Systems - Tập 24 - Trang 1024-1032 - 2011

Harun Uğuz¹

¹Department of Computer Engineering, Selçuk University, Konya, Turkey

Tài liệu tham khảo

Aghdam, 2009, Text feature selection using ant colony optimization, Expert Systems with Applications, 36, 6843, 10.1016/j.eswa.2008.08.022 AlZamil, 2011, ROLEX-SP: rules of lexical syntactic patterns for free text categorization, Knowledge-Based Systems, 24, 58, 10.1016/j.knosys.2010.07.005 Baeza-Yates, 1999 Chang, 2008, Multilabel text categorization based on a new linear classifier learning method and a category-sensitive refinement method, Expert Systems with Applications, 34, 1948, 10.1016/j.eswa.2007.02.037 Cover, 1967, Nearest neighbor pattern classification, IEEE Transactions on Information Theory, 13, 21, 10.1109/TIT.1967.1053964 Damerau, 2004, Text categorization for a comprehensive time-dependent benchmark, Information Processing and Management, 40, 209, 10.1016/S0306-4573(03)00006-2 ElAlami, 2009, A filter model for feature subset selection based on genetic algorithm, Knowledge-Based Systems, 22, 356, 10.1016/j.knosys.2009.02.006 Ferr, 1995, Selection of components in principal component analysis: a comparison of methods, Computing and Statistical Data Analysis, 19, 669, 10.1016/0167-9473(94)00020-J Fuhr, 1991, A probabilistic learning approach for document indexing, ACM Transactions on Information Systems, 9, 223, 10.1145/125187.125189 Gen, 2000, vol. 68 Goldberg, 1989 Holland, 1975 T. Joachims, A probabilistic analysis of the rocchio algorithm with TFIDF for text categorization, in: Proceedings of Fourteenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco, 1997, pp. 143–151. T. Joachims, Text categorization with support vector machines: learning with many relevant features, in: Proceedings of the 10th European Conference on Machine Learning, pp. 137–142. Jolliffe, 1986 S.L.Y. Lam, D.L. Lee, Feature reduction for neural network based text categorization, in: Sixth International Conference on Database Systems for Advanced Applications (DASFAA’99), 1999, p. 195. Lam, 2003, Automatic textual document categorization based on generalized instance sets and a metamodel, Proceeding of the IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 628, 10.1109/TPAMI.2003.1195997 D.D. Lewis, Reuters-21578 text categorization test collection, distribution 1.0. <http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html>, 1997. Li, 2009, Combination of modified BPNN algorithms and an efficient feature selection method for text categorization, Information Processing and Management, 45, 329, 10.1016/j.ipm.2008.09.004 Y. Li, D.F. Hsu, S.M. Chung, Combining multiple feature selection methods for text categorization by using rank-score characteristics, in: 21st IEEE International Conference on Tools with Artificial Intelligence, 2009, pp. 508–517. Liu, 2007, Combined mining of Web server logs and web contents for classifying user navigation patterns and predicting users’ future requests, Data and Knowledge Engineering, 61, 304, 10.1016/j.datak.2006.06.001 L. Liu, J. Kang, J. Yu, Z. Wang, A comparative study on unsupervised feature selection methods for text clustering, in: Proceeding of NLP-KE’O5, 2005, pp. 597–601. A. McCallum, K. Nigam, A comparison of event models for Naive Bayes text classification, in: AAAI’98 Workshop on Learning for Text Categorization, 1998, pp. 41–48. Mitchel, 1997 Mitra, 2007, Text classification: a least square support vector machine approach, Applied Soft Computing, 7, 908, 10.1016/j.asoc.2006.04.002 Porter, 1980, An algorithm for suffix stripping, Program (Automated Library and Information Systems), 14, 130, 10.1108/eb046814 Quinlan, 1986, Induction of decision trees, Machine Learning, 1, 81, 10.1007/BF00116251 Salton, 1988, Term-weighting approaches in automatic text retrieval, Information Processing and Management, 24, 513, 10.1016/0306-4573(88)90021-0 F. Sebastiani, A tutorial on automated text categorisation, in: Proceedings of the ASAI-99, in: 1st Argentinian Symposium on Artificial Intelligence, Buenos Aires, AR., 1999, pp. 17–35. Selamat, 2004, Web page feature selection and classification using neural Networks, Information Sciences, 158, 69, 10.1016/j.ins.2003.03.003 N. Slonim, N. Tishby, Document clustering using word clusters via the information bottleneck method, in: Proceedings of SJGIR’00, 2000, pp. 208–215. Song, 2009, Genetic algorithm for text clustering based on latent semantic indexing, Computers and Mathematics with Applications, 57, 1901, 10.1016/j.camwa.2008.10.010 J.-T. Sun, Z. Chen, H.-J. Zeng, Y. Lu, C.-Y. Shi, W.-Y. Ma, Supervised latent semantic indexing for document categorization, in: ICDM, IEEE Press, 2004, pp. 535–538. Tan, 2006, An effective refinement strategy for KNN text classifier expert, Systems with Applications, 30, 290, 10.1016/j.eswa.2005.07.019 Valle, 1999, Selection of the number of principal components: the variance of the reconstruction error criterion with a comparison to other methods, Ind, Engineering Chemistry Research, 38, 4389, 10.1021/ie990110i Van Rijsbergen, 1979 Warne, 2004, Statistical and computational intelligence techniques for inferential model development: a comparative evaluation and a novel proposition for fusion, Engineering Applications of Artificial Intelligence, 17, 871, 10.1016/j.engappai.2004.08.020 Wyse, 1980, A critical evaluation of intrinsic dimensionality algorithms, Pattern Recognition in Practice, 415 Yang, 1997, An evaluation of statistical approaches to text categorization, Information Retrieval, 1, 76 Y. Yang, J.O. Pedersen, A comparative study on feature selection in text categorization, in: Proceedings of the 14th International Conference on Machine Learning, 1997, pp. 412–420. Yu, 2008, Latent semantic analysis for text categorization using neural network, Knowledge-Based Systems, 21, 900, 10.1016/j.knosys.2008.03.045 Zhang, 2006, Multilabel neural networks with applications to functional genomics and text categorization, IEEE Transactions on Knowledge and Data Engineering, 18, 1338, 10.1109/TKDE.2006.162 Zhang, 2007, Artificial neural networks based on principal component analysis input selection for clinical pattern recognition analysis, Talanta, 73, 68, 10.1016/j.talanta.2007.02.030 Zhang, 2008, Text classification based on multi-word with support vector machine, Knowledge-Based Systems, 21, 879, 10.1016/j.knosys.2008.03.044 W. Zhao, Y. Wang, D. Li, A dynamic feature selection method based on combination of GA with K-means, in: 2nd International Conference on Industrial Mechatronics and Automation, 2010, pp. 271–274. C. Zifeng, X. Baowen, Z. Weifeng, J. Dawei, X. Junling, CLDA: feature selection for text categorization based on constrained LDA, in: International Conference on Semantic Computing (ICSC 2007), 2007, pp. 702–712.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA