Recent advances in feature selection and its applications
Tóm tắt
Từ khóa
Tài liệu tham khảo
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 31:1157–1182
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17:494–502
Hughes GF (1968) On the mean accuracy of statistical pattern recognizers. IEEE Trans Inf Theory 14:55–63
Miller AJ (1984) Selection of subsets of regression variables. J R Stat Soc 147:389–425
Blum A, Langle P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97:245–271
Inza I, Larranaga P, Blanco R, Cerrolaza AJ (2004) Filter versus wrapper gene selection approaches in DNA microarray domains. Artif Intell Med 31:91–103
Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305
Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537
Singh D, Febbo PG, Ross K (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 2:203–209
Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S (2001) Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci USA 98:13790–13795
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon cancer tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96:6745–6750
Zhao Z (2010) Spectral feature selection for mining ultrahigh dimensional data, Ph.D. thesis. Arizona State University
Guyon I, Gunn S, Nikravesh M, Zadeh L (2006) Feature extraction, foundations and applications. Springer, Physica-Verlag, New York
Tang JL, Alelyani S, Liu H (2014) Feature selection for classification—a review. In: Aggarwal C (ed) Data classification: algorithms and applications. CRC Press, Boca Raton
Li JD, Cheng KW, Wang SH, Morstatter F, Trevino RP, Tang JL, Liu H (2016) Feature selection: a data perspective, vol 3, pp 1–73. arXiv:1601.07996
Peng HC, Long FH, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238
Mitra P, Murthy CA, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24:301–312
Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of international conference on machine learning, pp 359–366
Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of international conference on machine learning, pp 856–863
Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224
Saeys Y, Abeel T, de Peer YV (2008) Robust feature selection using ensemble feature selection techniques. In: Proceedings of the 25th European conference on machine learning and knowledge discovery in databases, Banff, pp 313–325
Han Y, Yu L (2010) A variance reduction framework for stable feature selection. In: Proceedings of the international conference on data mining, pp 206–215
Loscalzo S, Yu L, Ding C (2009) Consensus group stable feature selection. In: Proceedings of ACM SIGKDD conference on knowledge discovery and data mining, pp 567–575
Abeel T, Helleputte T, de Peer YV, Dupont P, Saeys Y (2010) Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26:392–398
Li Y, Gao SY, Chen SC (2012) Ensemble feature weighting based on local learning and diversity. In: AAAI Conference on artificial intelligence, pp 1019–1025
Woznica A, Nguyen P, Kalousis A (2012) Model mining for robust feature selection. In: Proceedings of ACM SIGKDD conference on knowledge discovery and data mining, pp 913–921
Yu L, Han Y, Berens ME (2012) Stable gene selection from microarray data via sample weighting. IEEE/ACM Trans Comput Biol Bioinform 9:262–272
Yu L, Ding C, Loscalzo S (2008) Stable feature selection via dense feature groups. In: Proceedings of ACM SIGKDD conference on knowledge discovery and data mining, pp 803–811
Li Y, Huang SS, Chen SC, Si J (2013) Stable l2-regularized ensemble feature weighting. In: Proceedings of the 11th international workshop on multiple classifier systems, pp 167–178
Li Y, Si J, Zhou GJ, Huang SS, Chen SC (2015) Frel: a stable feature selection algorithm. IEEE Trans Neural Netw Learn Syst 26:1388–1402
Crammer K, Bachrach RG, Navot A, Tishby N (2002) Margin analysis of the LVQ algorithm. In: Proceedings of advances in neural information processing systems, pp 462–469
Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Stat Methodol) 58:267–288
Ng AY (2004) Feature selection, l1 vs. l2 regularization, and rotational invariance. In: Proceedings of international conference on machine learning, pp 78–85
Jenatton R, Obozinski G, Bach F (2010) Structured sparse principal component analysis. In: Proceedings of international conference on artificial intelligence and statistics
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Stat Methodol) 68:49–67
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67:301–320
Kim S, Xing EP (2010) Tree-guided group lasso for multi-task regression with structured sparsity. In: Proceedings of the 27th international conference on machine learning
Wang J, Zhou JY, Liu J, Wonka P, Ye JP (2014) A safe screening rule for sparse logistic regression. In: Proceedings of advances in neural information processing systems, pp 1053–1061
Wang J, Ye JP (2015) Safe screening for multi-task feature learning with multiple data matrices. In: Proceedings of the 32nd international conference on machine learning
Zhao Z, Wang JX, Sharma S, Agarwal N, Liu H, Chang Y (2010) An integrative approach to identifying biologically relevant genes. In: Proceedings of SIAM International conference on data mining
Weinberger K, Dasgupta A, Langford J, Smola A, Attenberg J (2009) Feature hashing for large scale multitask learning. In: Proceedings of international conference on machine learning
Chu CT, Kim SK, Lin YA, Yu YY, Bradski G, Ng A, Olukotun K (2007) Map-reduce for machine learning on multicore. In: Proceedings of advances in neural information processing systems
Snir M, Otto S, Lederman SH, Walker D, Dongarra J (1995) MPI: the complete reference, 1st edn. MIT Press, Cambridge
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51:107–113
Zhao ZA, Liu H (2012) Spectral feature selection for data mining. Taylor and Francis Group, London
Zhao Z, Zhang RW, Cox J, Duling D, Sarle W (2013) Massively parallel feature selection: an approach based on variance preservation. Mach Learn 92:195–220
Das K, Bhaduri K (2010) H. Kargupta: A local asynchronous distributed privacy preserving feature selection algorithm for large peer-to-peer networks. Knowl. Inf Syst 24:341–367
Cao B, He LF, Kong XN, Yu PS, Hao ZF, Ragin AB (2014) Tensor-based multi-view feature selection with applications to brain diseases. In: Proceedings of the 2014 international conference on data mining, pp 40–49
Smalter A, Huan J, Lushington G (2009) Feature selection in the tensor product feature space. In: Proceedings of the 2009 international conference on data mining, pp 1004–1009
Tang JL, Hu X, Gao HJ, Liu H (2013) Unsupervised feature selection for multi-view data in social media. In: Proceedings of the 2013 SIAM conference on data mining
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422
Fang Z, Zhang ZM (2013) Discriminative feature selection for multi-view cross-domain learning. In: Proceedings of ACM international conference of information and knowledge management, pp 1321–1330
Chen WZ, Yan J, Zhang BY, Chen Z, Yang Q (2007) Document transformation for multi-label feature selection in text categorization. In: Proceedings of the 7th IEEE conference on data mining, pp 451–456
Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
Kass GV (1980) An exploratory technique for investigating large quantities of categorical data. Appl Stat 119–127
Yan J, Liu N, Zhang B, Yan S, Chen Z, Cheng Q, Fan W, Ma WY (2005) OCFS: optimal orthogonal centroid feature selection for text categorization. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, pp 122–129
Lastra G, Luaces O, Quevedo JR, Bahamonde A (2011) Graphical feature selection for multilabel classification tasks. In: Proceedings of the 10th international conference on advances in intelligent data analysis, pp 281–305
Kong X, Yu PS (2012) gMLC: a multi-label feature selection framework for graph classification. Knowl Inf Syst 31:281–305
Gu QQ, Li ZH, Han JW (2011) Correlated multi-label feature selection. In: Proceedings of the 20th ACM international conference on information and knowledge management, pp 1087–1096
Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. In: Advances in neural information processing systems, pp 681–687
Yan P, Li Y (2016) Graph-margin based multi-label feature selection. In: European conference on machine learning, pp 540–555
Perkins S, Theiler J (2003) Online feature selection using grafting. In: Proceedings of international conference on machine learning, pp 592–599
Wu X, Yu K, Wang H, Ding W (2010) Online streaming feature selection. In: Proceedings of international conference on machine learning, pp 1159–1166
Zhou D, Huang J, Scholkopf B (2005) Learning from labeled and unlabeled data on a directed graph. In: Proceedings of international conference on machine learning, pp 1036–1043
Yu K, Wu XD, Ding W, Pei J (2014) Towards scalable and accurate online feature selection for big data. In: Proceedings of IEEE conference on data mining, pp 660–669
Sengupta D, Bandyopadhyay S, Sinha D (2017) A scoring scheme for online feature selection: simulating model performance without retraining. IEEE Trans Neural Netw Learn Syst 28:405–414
Wang J, Zhao ZQ, Hu XG, Cheung YM, Wang M, Wu XD (2013) Online group feature selection. In: Proceedings of international joint conference on artificial intelligence
Wang J, Zhao P, Hoi S, Jin R (2014) Online feature selection and its applications. IEEE Trans Knowl Data Eng 26:698–710
Zhang Q, Zhang P, Long G, Ding W, Zhang C, Wu X (2015) Towards mining trapezoidal data streams. In: Proceedings of IEEE international conference on data mining, pp 1111–1116
Avidan S, Butman M (2006) Efficient methods for privacy preserving face detection. In: Advances in neural information processing systems, pp 57–64
Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 28:337–407
Zhou Q, Zhou H, Li T (2016) Cost-sensitive feature selection using random forest: selecting low-cost subsets of informative features. Knowl-Based Syst 95:1–11
Dwork C (2006) Differential privacy. In: Proceedings of international colloquium on automata, languages and programming, pp 1–12
Yang J, Li Y (2014) Differential privacy feature selection. In: Proceedings of international joint conference on neural networks, pp 4182–4189
Li Y, Yang J, Ji W (2016) Local learning-based feature weighting with privacy preservation. Neurocomputing 174:1107–1115
Sun YJ, Todorovic S, Goodison S (2010) Local learning based feature selection for high dimensional data analysis. IEEE Trans Pattern Anal Mach Intell 32:1–18
Barreno M, Nelson B, Joseph AD, Tygar JD (2010) The security of machine learning. Mach Learn 81:121–148
Huang L, Joseph AD, Nelson B, Rubinstein BIP, Tygar JD (2011) Adversarial machine learning. In: Proceedings of 4th ACM workshop on artificial intelligence and security, pp 43–58
Biggio B, Fumera G, Roli F (2014) Security evaluation of pattern classifiers under attack. IEEE Trans Knowl Data Eng 26:984–996
Li B, Vorobeychik Y (2014) Feature cross-substitution in adversarial classification. In: Proceedings of advances in neural information processing systems, pp 2087–2095
Xiao H, Biggio B, Brown G, Fumera G, Eckert C, Roli F (2015) Is feature selection secure against training data poisoning? In: Proceedings of the 32th international conference on machine learning
Zhang F, Chan PPK, Biggio B, Yeung DS, Roli F (2015) Adversarial feature selection against evasion attacks. IEEE Trans Cybern 46:766–777
Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517
Bolon-Canedo V, Sanchez-Marono N, Alonso-Betanzos A, Benitez JM, Herrera F (2014) A review of microarray datasets and applied feature selection methods. Inf Sci 282:111–135
Nie FP, Huang H, Cai X, Ding C (2010) Efficient and robust feature selection via joint l21-norms minimization. Adv Neural Inf Process Syst 23:1813–1821
Tang JL, Liu H (2012) Feature selection with linked data in social media. In: SIAM international conference on data mining
Tang JL, Liu H (2012) Unsupervised feature selection for linked social media data. In: Eighteenth ACM SIGKDD international conference on knowledge discovery and data mining
Tang JL, Liu H (2014) An unsupervised feature selection framework for social media data. IEEE Trans Knowl Data Eng 26:2914–2927
Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113-1-026113-15
Li JD, Tang JL, Hu X, Liu H (2015) Unsupervised streaming feature selection in social media. In: Proceedings of ACM international conference of information and knowledge management
Wu F, Han YH, Liu X, Shao J, Zhuang YT, Zhang ZF (2012) The heterogeneous feature selection with structural sparsity for multimedia annotation and hashing: a survey. Int J Multimed Inf Retr 1:3–15
Wright J, Yang A, Ganesh A, Sastry S, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31:210–227
Jiang W, Er GH, Dai QH, Gu JW (2006) Similarity-based online feature selection in content-based image retrieval. IEEE Trans Image Process 15:702–712
Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 38:337–374
Khoshgoftaar TM, Gao KH, Napolitano A, Wald R (2014) A comparative study of iterative and non-iterative feature selection techniques for software defect prediction. Info Syst Frontiers 16:801–822
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313:504–507
Zhao L, Hu Q, Wang W (2015) Heterogeneous feature selection with multi-modal deep neural networks and sparse group lasso. IEEE Trans Multimed 17:1936–1948