Improving performance of classification on incomplete data using feature selection and clustering

Applied Soft Computing - Tập 73 - Trang 848-861 - 2018
Cao Truong Tran1,2, Mengjie Zhang1, Peter Andreae1, Bing Xue1, Lam Thu Bui2
1School of Engineering and Computer Science, Victoria University of Wellington, PO Box 600, Wellington 6140, New Zealand
2Research Group of Computational Intelligence, Le Quy Don Technical University, 236 Hoang Quoc Viet St, Hanoi, Viet Nam

Tài liệu tham khảo

Duda, 2012 García-Laencina, 2010, Pattern classification with missing data: a review, Neural Comput. Appl., 19, 263, 10.1007/s00521-009-0295-6 M. Lichman, UCI machine learning repository, (2013). URL http://archive.ics.uci.edu/ml. Little, 2014 Farhangfar, 2007, A novel framework for imputation of missing values in databases, IEEE Trans. Syst. Man Cybern.-Part A: Syst. Humans, 37, 692, 10.1109/TSMCA.2007.902631 Silva-Ramírez, 2015, Single imputation with multilayer perceptron and multiple imputation combining multilayer perceptron and k-nearest neighbours for monotone patterns, Appl. Soft Comput., 29, 65, 10.1016/j.asoc.2014.09.052 Farhangfar, 2008, Impact of imputation of missing values on classification error for discrete data, Pattern Recognit., 41, 3692, 10.1016/j.patcog.2008.05.019 White, 2011, Multiple imputation using chained equations: issues and guidance for practice, Statist. Med., 30, 377, 10.1002/sim.4067 Tran, 2018, An effective and efficient approach to classification with incomplete data, Knowl.-Based Syst., 154, 1, 10.1016/j.knosys.2018.05.013 Fahad, 2014, A survey of clustering algorithms for big data: Taxonomy and empirical analysis, IEEE Trans. Emerg. Top. Comput., 2, 267, 10.1109/TETC.2014.2330519 Jose-Garcia, 2016, Automatic clustering using nature-inspired metaheuristics: A survey, Appl. Soft Comput., 41, 192, 10.1016/j.asoc.2015.12.001 Xue, 2016, A survey on evolutionary computation approaches to feature selection, IEEE Trans. Evol. Comput., 20, 606, 10.1109/TEVC.2015.2504420 Storn, 1997, Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces, J. Glob. Optim., 11, 341, 10.1023/A:1008202821328 Al-Ani, 2013, Feature subset selection using differential evolution and a wheel based search strategy, Swarm Evol. Comput., 9, 15, 10.1016/j.swevo.2012.09.003 B. Xue, W. Fu, M. Zhang, Multi-objective feature selection in classification: A differential evolution approach, in: SEAL, 2014, pp. 516–528. Batista, 2002, A study of k-nearest neighbour as an imputation method, HIS, 87, 251 Acuna, 2004, The treatment of missing values and its effect on classifier accuracy, Classification clustering Data Min. Appl., 639, 10.1007/978-3-642-17103-1_60 Buuren, 2011, mice: Multivariate imputation by chained equations in R, J. Statist. Softw., 45, 10.18637/jss.v045.i03 Royston, 2011, Multiple imputation by chained equations (MICE): implementation in Stata, J. Statist. Softw., 45, 1, 10.18637/jss.v045.i04 Luengo, 2012, On the choice of the best imputation methods for missing values considering three groups of classification methods, Knowl. Inf. Syst., 32, 77, 10.1007/s10115-011-0424-2 Batista, 2003, An analysis of four missing data treatment methods for supervised learning, Appl. Artif. Intell., 17, 519, 10.1080/713827181 Liu, 2013, Comparison of five iterative imputation methods for multivariate classification, Chemom. Intell. Lab. Syst., 120, 106, 10.1016/j.chemolab.2012.11.010 C.T. Tran, M. Zhang, P. Andreae, B. Xue, L.T. Bui, Multiple imputation and ensemble learning for classification with incomplete data, in: Intelligent and Evolutionary Systems: The 20th Asia Pacific Symposium, IES 2016, Canberra, Australia, November 2016, Proceedings, 2017, pp. 401–415. Xue, 2017, Evolutionary feature manipulation in data mining/big data, ACM SIGEVOlution, 10, 4, 10.1145/3089251.3089252 C. Larose, Model-based clustering of incomplete data. Kanungo, 2002, An efficient k-means clustering algorithm: Analysis and implementation, IEEE Trans. Pattern Anal. Mach. Intell., 24, 881, 10.1109/TPAMI.2002.1017616 Li, 2004, Towards missing data imputation: a study of fuzzy k-means clustering method, 573 C. Zhang, Y. Qin, X. Zhu, J. Zhang, S. Zhang, Clustering-based missing value imputation for data preprocessing, in: Industrial Informatics, 2006 IEEE International Conference on, 2006, pp. 1081–1086. Zhang, 2008, Missing value imputation based on data clustering, 128 B.M. Patil, R.C. Joshi, D. Toshniwal, Missing value on K-mean clustering with weighted distance, in: International Conference on Contemporary Computing, 2010, pp. 600–609. Gajawada, 2012, Missing value imputation method based on clustering and nearest neighbours, Int. J. Future Comput. Commun., 1, 206, 10.7763/IJFCC.2012.V1.54 Tian, 2013, Clustering-based multiple imputation via gray relational analysis for missing data and its application to aerospace field, Sci. World J., 2013, 10.1155/2013/720392 Tian, 2014, Missing data analyses: a hybrid multiple imputation algorithm using Gray System Theory and entropy based on clustering, Appl. Intell., 40, 376, 10.1007/s10489-013-0469-x S. Nikfalazar, C.-H. Yeh, S. Bedingfield, H.A. Khorshidi, A new iterative fuzzy clustering algorithm for multiple imputation of missing data, in: Fuzzy Systems (FUZZ-IEEE), 2017 IEEE International Conference on, 2017, pp. 1–6, https://ieeexplore.ieee.org/document/8015560. Tsai, 2016, Combining instance selection for better missing value imputation, J. Syst. Softw., 122, 63, 10.1016/j.jss.2016.08.093 P. Meesad, K. Hengpraprohm, Combination of knn-based feature selection and knn-based missing-value imputation of microarray data, in: Innovative Computing Information and Control, 2008. ICICIC’08. 3rd International Conference on, 2008, pp. 341–341. Aussem, 2010, A conservative feature subset selection algorithm with missing data, Neurocomputing, 73, 585, 10.1016/j.neucom.2009.05.019 Q. Lou, Z. Obradovic, Margin-based feature selection in incomplete data, in: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012, pp. 1040–1046. Doquire, 2012, Feature selection with missing data using mutual information estimators, Neurocomputing, 90, 3, 10.1016/j.neucom.2012.02.031 Qian, 2015, Mutual information criterion for feature selection from incomplete data, Neurocomputing, 168, 210, 10.1016/j.neucom.2015.05.105 Long, 2015, Variable selection in the presence of missing data: resampling and imputation, Biostatistics, 16, 596, 10.1093/biostatistics/kxv003 Tran, 2016, Improving performance for classification with incomplete data using wrapper-based feature selection, Evol. Intell., 9, 81, 10.1007/s12065-016-0141-6 C.T. Tran, M. Zhang, P. Andreae, B. Xue, Bagging and feature selection for classification with incomplete data, in: European Conference on the Applications of Evolutionary Computation, 2017, pp. 471–486. Hall, 2009, The WEKA data mining software: an update, ACM SIGKDD Explor. Newslett., 11, 10, 10.1145/1656274.1656278 De Souto, 2015, Impact of missing data imputation methods on gene expression clustering and classification, BMC Bioinform., 16, 64, 10.1186/s12859-015-0494-3 Yu, 2013, Regularized extreme learning machine for regression with missing data, Neurocomputing, 102, 45, 10.1016/j.neucom.2012.02.040 Demšar, 2006, Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res., 7, 1