Benchmark for filter methods for feature selection in high-dimensional classification data

Computational Statistics and Data Analysis - Tập 143 - Trang 106839 - 2020
Andrea Bommert1, Xudong Sun2, Bernd Bischl2, Jörg Rahnenführer1, Michel Lang1
1Department of Statistics, TU Dortmund University, 44221 Dortmund, Germany
2Department of Statistics, Ludwig-Maximilians-Universität München, Ludwigstr. 33, 80539, München, Germany

Tóm tắt

Từ khóa


Tài liệu tham khảo

Aphinyanaphongs, 2014, A comprehensive empirical comparison of modern supervised classification and feature selection methods for text categorization, J. Assoc. Inf. Sci. Technol., 65, 1964, 10.1002/asi.23110

Biau, 2019, Accelerated gradient boosting, Mach. Learn., 108, 971, 10.1007/s10994-019-05787-1

Bischl, 2016, mlr: Machine learning in R, J. Mach. Learn. Res., 17, 1

Bischl, 2012, Resampling methods for meta-model validation with recommendations for evolutionary computation, Evol. Comput., 20, 249, 10.1162/EVCO_a_00069

Bolón-Canedo, 2013, A review of feature selection methods on synthetic data, Knowl. Inf. Syst., 34, 483, 10.1007/s10115-012-0487-8

Bolón-Canedo, 2014, A review of microarray datasets and applied feature selection methods, Inform. Sci., 282, 111, 10.1016/j.ins.2014.05.042

Bommert, 2017, A multicriteria approach to find predictive and sparse models with stable feature selection for high-dimensional data, Comput. Math. Methods Med., 2017, 10.1155/2017/7907163

Breiman, 2001, Random forests, Mach. Learn., 45, 5, 10.1023/A:1010933404324

Breiman, 1984

Brezočnik, 2018, Swarm intelligence algorithms for feature selection: A review, Appl. Sci., 8, 10.3390/app8091521

Brown, 2012, Conditional likelihood maximisation: A unifying framework for information theoretic feature selection, J. Mach. Learn. Res., 13, 27

Cai, 2018, Feature selection in machine learning: A new perspective, Neurocomputing, 300, 70, 10.1016/j.neucom.2017.11.077

Casalicchio, 2017, OpenML: An R package to connect to the machine learning platform OpenML, Comput. Stat., 1

Chandrashekar, 2014, A survey on feature selection methods, Comput. Electr. Eng., 40, 16, 10.1016/j.compeleceng.2013.11.024

Darshan, 2018, Performance evaluation of filter-based feature selection techniques in classifying portable executable files, Procedia Comput. Sci., 125, 346, 10.1016/j.procs.2017.12.046

Dash, 1997, Feature selection for classification, Intell. Data Anal., 1, 131, 10.3233/IDA-1997-1302

Fayyad, 1993

Fernández-Delgado, 2014, Do we need hundreds of classifiers to solve real world classification problems?, J. Mach. Learn. Res., 15, 3133

Fleuret, 2004, Fast binary feature selection with conditional mutual information, J. Mach. Learn. Res., 5, 1531

Forman, 2003, An extensive empirical study of feature selection metrics for text classification, J. Mach. Learn. Res., 3, 1289

Ghosh, 2019, Genetic algorithm based cancerous gene identification from microarray data using ensemble of filter methods, Med. Biol. Eng. Comput., 57, 159, 10.1007/s11517-018-1874-4

Guyon, 2003, An introduction to variable and feature selection, J. Mach. Learn. Res., 3, 1157

Hall, 1999

Hanley, 1982, The meaning and use of the area under a receiver operating characteristic (ROC) curve, Radiology, 143, 29, 10.1148/radiology.143.1.7063747

Hira, 2015, A review of feature selection and feature extraction methods applied on microarray data, Adv. Bioinform., 2015, 10.1155/2015/198363

Hoque, 2018, EFS-MI: An ensemble feature selection method for classification, Complex Intell. Syst., 4, 105, 10.1007/s40747-017-0060-x

Huang, 2018, Feature clustering based support vector machine recursive feature elimination for gene selection, Appl. Intell., 48, 594, 10.1007/s10489-017-0992-2

Inza, 2004, Filter versus wrapper gene selection approaches in DNA microarray domains, Artif. Intell. Med., 31, 91, 10.1016/j.artmed.2004.01.007

Izenman, 2013

Jović, A., Brkić, K., Bogunović, N., 2015. A review of feature selection methods with applications. In: 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, pp. 1200–1205.

Kalousis, 2007, Stability of feature selection algorithms: A study on high-dimensional spaces, Knowl. Inf. Syst., 12, 95, 10.1007/s10115-006-0040-8

Karatzoglou, 2004, kernlab – an S4 package for kernel methods in R, J. Stat. Softw., 11, 1, 10.18637/jss.v011.i09

Ke, 2018, A new filter feature selection based on criteria fusion for gene microarray data, IEEE Access, 6, 61065, 10.1109/ACCESS.2018.2873634

Kerschke, 2019, Automated algorithm selection on continuous black-box problems by combining exploratory landscape analysis and machine learning, Evol. Comput., 27, 99, 10.1162/evco_a_00236

Kittler, 1978, Feature set search algorithms, 41

Kohavi, 1997, Wrappers for feature subset selection, Artificial Intelligence, 97, 273, 10.1016/S0004-3702(97)00043-X

Kruskal, 1952, Use of ranks in one-criterion variance analysis, J. Amer. Statist. Assoc., 47, 583, 10.1080/01621459.1952.10483441

Kursa, 2018

Lang, 2017, batchtools: Tools for R to work on batch systems, J. Open Source Softw., 2, 10.21105/joss.00135

Larose, 2014

Lazar, 2012, A survey on filter techniques for feature selection in gene eexpression microarray analysis, IEEE/ACM Trans. Comput. Biol. Bioinform., 9, 1106, 10.1109/TCBB.2012.33

Li, 2018, Feature selection: A data perspective, ACM Comput. Surv., 50, 10.1145/3136625

Liu, 2004, A comparative study on feature selection methods for drug discovery, J. Chem. Inf. Comput. Sci., 44, 1823, 10.1021/ci049875d

Liu, 2002, A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns, Genome Inform., 13, 51

Liu, 2005, Toward integrating feature selection algorithms for classification and clustering, IEEE Trans. Knowl. Data Eng., 17, 491, 10.1109/TKDE.2005.66

Meyer, 2008, Information-theoretic feature selection in microarray data using variable complementarity, IEEE J. Sel. Top. Sign. Proces., 2, 261, 10.1109/JSTSP.2008.923858

Mohtashami, 2019, A hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts, Iran. J. Fuzzy Syst., 16, 165

Nogueira, S., Brown, G., 2016. Measuring the stability of feature selection. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. pp. 442–457.

Peng, 2005, Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy, IEEE Trans. Pattern Anal. Mach. Intell., 27, 1226, 10.1109/TPAMI.2005.159

R Core Team, 2017

Ramey, 2016

Rasch, 2011

Ritchie, 2015, limma powers differential expression analyses for RNA-sequencing and microarray studies, Nucleic Acids Res., 43, 10.1093/nar/gkv007

Romanski, 2016

Saeys, 2007, A review of feature selection techniques in bioinformatics, Bioinformatics, 23, 2507, 10.1093/bioinformatics/btm344

Sammut, 2011

Sánchez-Maroño, N., Alonso-Betanzos, A., Tombilla-Sanromán, M., 2007. Filter methods for feature selection – A comparative study. In: International Conference on Intelligent Data Engineering and Automated Learning. pp. 178–187.

Schliep, 2016

Simon, 2011, Regularization paths for cox’s proportional hazards model via coordinate descent, J. Stat. Softw., 39, 1, 10.18637/jss.v039.i05

Smyth, 2004, Linear models and empirical Bayes methods for assessing differential expression in microarray experiments, Stat. Appl. Genet. Mol. Biol., 3, 10.2202/1544-6115.1027

Strobl, 2008, Conditional variable importance for random forests, BMC Bioinformatics, 9

Tang, 2014, Feature selection for classification: A review, 37

Therneau, 2017

Tibshirani, 1996, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B Stat. Methodol., 58, 267, 10.1111/j.2517-6161.1996.tb02080.x

Tibshirani, 2011

Tusher, 2001, Significance analysis of microarrays applied to the ionizing radiation response, Proc. Natl. Acad. Sci. USA, 98, 5116, 10.1073/pnas.091062498

Vanschoren, 2013, OpenML: Networked science in machine learning, ACM SIGKDD Explor. Newsl., 15, 49, 10.1145/2641190.2641198

Venkatesh, 2019, A review of feature selection and its methods, Cybern. Inf. Technol., 19, 3

Wah, 2018, Feature selection methods: case of filter and wrapper approaches for maximising classification accuracy, Pertanika J. Sci. Technol., 26, 329

Wright, 2017, ranger: A fast implementation of random forests for high dimensional data in C++ and R, J. Stat. Softw., 77, 1, 10.18637/jss.v077.i01

Xue, 2015, A comprehensive comparison on evolutionary feature selection approaches to classification, Int. J. Comput. Intell. Appl., 14, 10.1142/S146902681550008X

Xue, 2016, A survey on evolutionary computation approaches to feature selection, IEEE Trans. Evol. Comput., 20, 606, 10.1109/TEVC.2015.2504420

Yang, 1998, Feature subset selection using a genetic algorithm, 117

Yu, 2004, Efficient feature selection via analysis of relevance and redundancy, J. Mach. Learn. Res., 5, 1205

Zawadzki, 2017

Zhu, 2007, Wrapper-filter feature selection algorithm using a memetic framework, IEEE Trans. Syst. Man Cybern. B, 37, 70, 10.1109/TSMCB.2006.883267