Improving performance of classification on incomplete data using feature selection and clustering

Applied Soft Computing - Tập 73 - Trang 848-861 - 2018

Cao Truong Tran^1,2, Mengjie Zhang¹, Peter Andreae¹, Bing Xue¹, Lam Thu Bui²

¹School of Engineering and Computer Science, Victoria University of Wellington, PO Box 600, Wellington 6140, New Zealand

²Research Group of Computational Intelligence, Le Quy Don Technical University, 236 Hoang Quoc Viet St, Hanoi, Viet Nam

Tài liệu tham khảo

Duda, 2012 García-Laencina, 2010, Pattern classification with missing data: a review, Neural Comput. Appl., 19, 263, 10.1007/s00521-009-0295-6 M. Lichman, UCI machine learning repository, (2013). URL http://archive.ics.uci.edu/ml. Little, 2014 Farhangfar, 2007, A novel framework for imputation of missing values in databases, IEEE Trans. Syst. Man Cybern.-Part A: Syst. Humans, 37, 692, 10.1109/TSMCA.2007.902631 Silva-Ramírez, 2015, Single imputation with multilayer perceptron and multiple imputation combining multilayer perceptron and k-nearest neighbours for monotone patterns, Appl. Soft Comput., 29, 65, 10.1016/j.asoc.2014.09.052 Farhangfar, 2008, Impact of imputation of missing values on classification error for discrete data, Pattern Recognit., 41, 3692, 10.1016/j.patcog.2008.05.019 White, 2011, Multiple imputation using chained equations: issues and guidance for practice, Statist. Med., 30, 377, 10.1002/sim.4067 Tran, 2018, An effective and efficient approach to classification with incomplete data, Knowl.-Based Syst., 154, 1, 10.1016/j.knosys.2018.05.013 Fahad, 2014, A survey of clustering algorithms for big data: Taxonomy and empirical analysis, IEEE Trans. Emerg. Top. Comput., 2, 267, 10.1109/TETC.2014.2330519 Jose-Garcia, 2016, Automatic clustering using nature-inspired metaheuristics: A survey, Appl. Soft Comput., 41, 192, 10.1016/j.asoc.2015.12.001 Xue, 2016, A survey on evolutionary computation approaches to feature selection, IEEE Trans. Evol. Comput., 20, 606, 10.1109/TEVC.2015.2504420 Storn, 1997, Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces, J. Glob. Optim., 11, 341, 10.1023/A:1008202821328 Al-Ani, 2013, Feature subset selection using differential evolution and a wheel based search strategy, Swarm Evol. Comput., 9, 15, 10.1016/j.swevo.2012.09.003 B. Xue, W. Fu, M. Zhang, Multi-objective feature selection in classification: A differential evolution approach, in: SEAL, 2014, pp. 516–528. Batista, 2002, A study of k-nearest neighbour as an imputation method, HIS, 87, 251 Acuna, 2004, The treatment of missing values and its effect on classifier accuracy, Classification clustering Data Min. Appl., 639, 10.1007/978-3-642-17103-1_60 Buuren, 2011, mice: Multivariate imputation by chained equations in R, J. Statist. Softw., 45, 10.18637/jss.v045.i03 Royston, 2011, Multiple imputation by chained equations (MICE): implementation in Stata, J. Statist. Softw., 45, 1, 10.18637/jss.v045.i04 Luengo, 2012, On the choice of the best imputation methods for missing values considering three groups of classification methods, Knowl. Inf. Syst., 32, 77, 10.1007/s10115-011-0424-2 Batista, 2003, An analysis of four missing data treatment methods for supervised learning, Appl. Artif. Intell., 17, 519, 10.1080/713827181 Liu, 2013, Comparison of five iterative imputation methods for multivariate classification, Chemom. Intell. Lab. Syst., 120, 106, 10.1016/j.chemolab.2012.11.010 C.T. Tran, M. Zhang, P. Andreae, B. Xue, L.T. Bui, Multiple imputation and ensemble learning for classification with incomplete data, in: Intelligent and Evolutionary Systems: The 20th Asia Pacific Symposium, IES 2016, Canberra, Australia, November 2016, Proceedings, 2017, pp. 401–415. Xue, 2017, Evolutionary feature manipulation in data mining/big data, ACM SIGEVOlution, 10, 4, 10.1145/3089251.3089252 C. Larose, Model-based clustering of incomplete data. Kanungo, 2002, An efficient k-means clustering algorithm: Analysis and implementation, IEEE Trans. Pattern Anal. Mach. Intell., 24, 881, 10.1109/TPAMI.2002.1017616 Li, 2004, Towards missing data imputation: a study of fuzzy k-means clustering method, 573 C. Zhang, Y. Qin, X. Zhu, J. Zhang, S. Zhang, Clustering-based missing value imputation for data preprocessing, in: Industrial Informatics, 2006 IEEE International Conference on, 2006, pp. 1081–1086. Zhang, 2008, Missing value imputation based on data clustering, 128 B.M. Patil, R.C. Joshi, D. Toshniwal, Missing value on K-mean clustering with weighted distance, in: International Conference on Contemporary Computing, 2010, pp. 600–609. Gajawada, 2012, Missing value imputation method based on clustering and nearest neighbours, Int. J. Future Comput. Commun., 1, 206, 10.7763/IJFCC.2012.V1.54 Tian, 2013, Clustering-based multiple imputation via gray relational analysis for missing data and its application to aerospace field, Sci. World J., 2013, 10.1155/2013/720392 Tian, 2014, Missing data analyses: a hybrid multiple imputation algorithm using Gray System Theory and entropy based on clustering, Appl. Intell., 40, 376, 10.1007/s10489-013-0469-x S. Nikfalazar, C.-H. Yeh, S. Bedingfield, H.A. Khorshidi, A new iterative fuzzy clustering algorithm for multiple imputation of missing data, in: Fuzzy Systems (FUZZ-IEEE), 2017 IEEE International Conference on, 2017, pp. 1–6, https://ieeexplore.ieee.org/document/8015560. Tsai, 2016, Combining instance selection for better missing value imputation, J. Syst. Softw., 122, 63, 10.1016/j.jss.2016.08.093 P. Meesad, K. Hengpraprohm, Combination of knn-based feature selection and knn-based missing-value imputation of microarray data, in: Innovative Computing Information and Control, 2008. ICICIC’08. 3rd International Conference on, 2008, pp. 341–341. Aussem, 2010, A conservative feature subset selection algorithm with missing data, Neurocomputing, 73, 585, 10.1016/j.neucom.2009.05.019 Q. Lou, Z. Obradovic, Margin-based feature selection in incomplete data, in: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012, pp. 1040–1046. Doquire, 2012, Feature selection with missing data using mutual information estimators, Neurocomputing, 90, 3, 10.1016/j.neucom.2012.02.031 Qian, 2015, Mutual information criterion for feature selection from incomplete data, Neurocomputing, 168, 210, 10.1016/j.neucom.2015.05.105 Long, 2015, Variable selection in the presence of missing data: resampling and imputation, Biostatistics, 16, 596, 10.1093/biostatistics/kxv003 Tran, 2016, Improving performance for classification with incomplete data using wrapper-based feature selection, Evol. Intell., 9, 81, 10.1007/s12065-016-0141-6 C.T. Tran, M. Zhang, P. Andreae, B. Xue, Bagging and feature selection for classification with incomplete data, in: European Conference on the Applications of Evolutionary Computation, 2017, pp. 471–486. Hall, 2009, The WEKA data mining software: an update, ACM SIGKDD Explor. Newslett., 11, 10, 10.1145/1656274.1656278 De Souto, 2015, Impact of missing data imputation methods on gene expression clustering and classification, BMC Bioinform., 16, 64, 10.1186/s12859-015-0494-3 Yu, 2013, Regularized extreme learning machine for regression with missing data, Neurocomputing, 102, 45, 10.1016/j.neucom.2012.02.040 Demšar, 2006, Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res., 7, 1

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver