Improving performance of classification on incomplete data using feature selection and clustering
Tài liệu tham khảo
Duda, 2012
García-Laencina, 2010, Pattern classification with missing data: a review, Neural Comput. Appl., 19, 263, 10.1007/s00521-009-0295-6
M. Lichman, UCI machine learning repository, (2013). URL http://archive.ics.uci.edu/ml.
Little, 2014
Farhangfar, 2007, A novel framework for imputation of missing values in databases, IEEE Trans. Syst. Man Cybern.-Part A: Syst. Humans, 37, 692, 10.1109/TSMCA.2007.902631
Silva-Ramírez, 2015, Single imputation with multilayer perceptron and multiple imputation combining multilayer perceptron and k-nearest neighbours for monotone patterns, Appl. Soft Comput., 29, 65, 10.1016/j.asoc.2014.09.052
Farhangfar, 2008, Impact of imputation of missing values on classification error for discrete data, Pattern Recognit., 41, 3692, 10.1016/j.patcog.2008.05.019
White, 2011, Multiple imputation using chained equations: issues and guidance for practice, Statist. Med., 30, 377, 10.1002/sim.4067
Tran, 2018, An effective and efficient approach to classification with incomplete data, Knowl.-Based Syst., 154, 1, 10.1016/j.knosys.2018.05.013
Fahad, 2014, A survey of clustering algorithms for big data: Taxonomy and empirical analysis, IEEE Trans. Emerg. Top. Comput., 2, 267, 10.1109/TETC.2014.2330519
Jose-Garcia, 2016, Automatic clustering using nature-inspired metaheuristics: A survey, Appl. Soft Comput., 41, 192, 10.1016/j.asoc.2015.12.001
Xue, 2016, A survey on evolutionary computation approaches to feature selection, IEEE Trans. Evol. Comput., 20, 606, 10.1109/TEVC.2015.2504420
Storn, 1997, Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces, J. Glob. Optim., 11, 341, 10.1023/A:1008202821328
Al-Ani, 2013, Feature subset selection using differential evolution and a wheel based search strategy, Swarm Evol. Comput., 9, 15, 10.1016/j.swevo.2012.09.003
B. Xue, W. Fu, M. Zhang, Multi-objective feature selection in classification: A differential evolution approach, in: SEAL, 2014, pp. 516–528.
Batista, 2002, A study of k-nearest neighbour as an imputation method, HIS, 87, 251
Acuna, 2004, The treatment of missing values and its effect on classifier accuracy, Classification clustering Data Min. Appl., 639, 10.1007/978-3-642-17103-1_60
Buuren, 2011, mice: Multivariate imputation by chained equations in R, J. Statist. Softw., 45, 10.18637/jss.v045.i03
Royston, 2011, Multiple imputation by chained equations (MICE): implementation in Stata, J. Statist. Softw., 45, 1, 10.18637/jss.v045.i04
Luengo, 2012, On the choice of the best imputation methods for missing values considering three groups of classification methods, Knowl. Inf. Syst., 32, 77, 10.1007/s10115-011-0424-2
Batista, 2003, An analysis of four missing data treatment methods for supervised learning, Appl. Artif. Intell., 17, 519, 10.1080/713827181
Liu, 2013, Comparison of five iterative imputation methods for multivariate classification, Chemom. Intell. Lab. Syst., 120, 106, 10.1016/j.chemolab.2012.11.010
C.T. Tran, M. Zhang, P. Andreae, B. Xue, L.T. Bui, Multiple imputation and ensemble learning for classification with incomplete data, in: Intelligent and Evolutionary Systems: The 20th Asia Pacific Symposium, IES 2016, Canberra, Australia, November 2016, Proceedings, 2017, pp. 401–415.
Xue, 2017, Evolutionary feature manipulation in data mining/big data, ACM SIGEVOlution, 10, 4, 10.1145/3089251.3089252
C. Larose, Model-based clustering of incomplete data.
Kanungo, 2002, An efficient k-means clustering algorithm: Analysis and implementation, IEEE Trans. Pattern Anal. Mach. Intell., 24, 881, 10.1109/TPAMI.2002.1017616
Li, 2004, Towards missing data imputation: a study of fuzzy k-means clustering method, 573
C. Zhang, Y. Qin, X. Zhu, J. Zhang, S. Zhang, Clustering-based missing value imputation for data preprocessing, in: Industrial Informatics, 2006 IEEE International Conference on, 2006, pp. 1081–1086.
Zhang, 2008, Missing value imputation based on data clustering, 128
B.M. Patil, R.C. Joshi, D. Toshniwal, Missing value on K-mean clustering with weighted distance, in: International Conference on Contemporary Computing, 2010, pp. 600–609.
Gajawada, 2012, Missing value imputation method based on clustering and nearest neighbours, Int. J. Future Comput. Commun., 1, 206, 10.7763/IJFCC.2012.V1.54
Tian, 2013, Clustering-based multiple imputation via gray relational analysis for missing data and its application to aerospace field, Sci. World J., 2013, 10.1155/2013/720392
Tian, 2014, Missing data analyses: a hybrid multiple imputation algorithm using Gray System Theory and entropy based on clustering, Appl. Intell., 40, 376, 10.1007/s10489-013-0469-x
S. Nikfalazar, C.-H. Yeh, S. Bedingfield, H.A. Khorshidi, A new iterative fuzzy clustering algorithm for multiple imputation of missing data, in: Fuzzy Systems (FUZZ-IEEE), 2017 IEEE International Conference on, 2017, pp. 1–6, https://ieeexplore.ieee.org/document/8015560.
Tsai, 2016, Combining instance selection for better missing value imputation, J. Syst. Softw., 122, 63, 10.1016/j.jss.2016.08.093
P. Meesad, K. Hengpraprohm, Combination of knn-based feature selection and knn-based missing-value imputation of microarray data, in: Innovative Computing Information and Control, 2008. ICICIC’08. 3rd International Conference on, 2008, pp. 341–341.
Aussem, 2010, A conservative feature subset selection algorithm with missing data, Neurocomputing, 73, 585, 10.1016/j.neucom.2009.05.019
Q. Lou, Z. Obradovic, Margin-based feature selection in incomplete data, in: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012, pp. 1040–1046.
Doquire, 2012, Feature selection with missing data using mutual information estimators, Neurocomputing, 90, 3, 10.1016/j.neucom.2012.02.031
Qian, 2015, Mutual information criterion for feature selection from incomplete data, Neurocomputing, 168, 210, 10.1016/j.neucom.2015.05.105
Long, 2015, Variable selection in the presence of missing data: resampling and imputation, Biostatistics, 16, 596, 10.1093/biostatistics/kxv003
Tran, 2016, Improving performance for classification with incomplete data using wrapper-based feature selection, Evol. Intell., 9, 81, 10.1007/s12065-016-0141-6
C.T. Tran, M. Zhang, P. Andreae, B. Xue, Bagging and feature selection for classification with incomplete data, in: European Conference on the Applications of Evolutionary Computation, 2017, pp. 471–486.
Hall, 2009, The WEKA data mining software: an update, ACM SIGKDD Explor. Newslett., 11, 10, 10.1145/1656274.1656278
De Souto, 2015, Impact of missing data imputation methods on gene expression clustering and classification, BMC Bioinform., 16, 64, 10.1186/s12859-015-0494-3
Yu, 2013, Regularized extreme learning machine for regression with missing data, Neurocomputing, 102, 45, 10.1016/j.neucom.2012.02.040
Demšar, 2006, Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res., 7, 1