An unsupervised approach to feature discretization and selection

Pattern Recognition - Tập 45 - Trang 3048-3060 - 2012
Artur J. Ferreira1,2, Mário A.T. Figueiredo3,2
1Instituto Superior de Engenharia de Lisboa, Polytechnic Institute of Lisbon, Portugal
2Instituto de Telecomunicações, Lisboa, Portugal
3Instituto Superior Técnico, Technical University of Lisbon, Portugal

Tài liệu tham khảo

Aha, 1991, Instance-based learning algorithms, Machine Learning, 6, 37, 10.1007/BF00153759 V. Bolon-Canedo, S. Seth, N. Sanchez-Marono, A. Alonso-Betanzos, J. Principe, Statistical dependence measure for feature selection in microarray datasets, in: 19th European Symposium on Artificial Neural Networks-ESANN'2011. Belgium, 2011, pp. 23–28. Boser, 1992, A training algorithm for optimal margin classifiers, 144 Clarke, 2000, Entropy and MDL discretization of continuous variables for Bayesian belief networks, International Journal of Intelligent Systems, 15, 61, 10.1002/(SICI)1098-111X(200001)15:1<61::AID-INT4>3.0.CO;2-O Cover, 1991 Cristianini, 2000 Demsar, 2006, Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research, 7, 1 Dougherty, 1995, Supervised and unsupervised discretization of continuous features, 194 Duda, 2001 R. Duin, P. Juszczak, P. Paclik, E. Pekalska, D. Ridder, D. Tax, S. Verzakov, PRTools4.1, a Matlab Toolbox for Pattern Recognition, Technical Report, Delft University of Technology, 2007. Escolano, 2009 Fang, 2011, Integrative gene selection for classification of microarray data, Computer and Information Science, 4, 55 U. Fayyad, K. Irani, Multi-interval discretization of continuous-valued attributes for classification learning, in: Proceedings of the International Joint Conference on Uncertainty in Artificial Intelligence, 1993, pp. 1022–1027. A. Ferreira, M. Figueiredo, Feature transformation and reduction for text classification, in: 10th International Workshop on Pattern Recognition and Information Systems—PRIS'2010, 2010, pp. 72–81. A. Ferreira, M. Figueiredo, Unsupervised feature selection for sparse data, in: 19th European Symposium on Artificial Neural Networks-ESANN'2011, 2011, pp. 339–344. Forman, 2003, An extensive empirical study of feature selection metrics for text classification, Journal of Machine Learning Research, 3, 1289 A. Frank, A. Asuncion, UCI machine learning repository, 2010 〈http://archive.ics.uci.edu/ml〉. Friedman, 1937, The use of ranks to avoid the assumption of normality implicit in the analysis of variance, Journal of the American Statistical Association, 32, 675, 10.1080/01621459.1937.10503522 Friedman, 1940, A comparison of alternative tests of significance for the problem of m rankings, The Annals of Mathematical Statistics, 11, 86, 10.1214/aoms/1177731944 Furey, 2000, Support vector machine classification and validation of cancer tissue samples using microarray expression data, Bioinformatics, 16, 906, 10.1093/bioinformatics/16.10.906 Guyon, 2003, An introduction to variable and feature selection, Journal of Machine Learning Research, 3, 1157 2006 Hastie, 2001 Ho, 1998, The random subspace method for constructing decision forests, IEEE Transactions on Pattern Analysis Machine Intelligence, 20, 832, 10.1109/34.709601 Huerta, 2006, A hybrid GA/SVM approach for gene selection and classification of microarray data, 34 Joachims, 2001 Kohavi, 1997, Wrappers for feature subset selection, Artificial Intelligence, 97, 273, 10.1016/S0004-3702(97)00043-X Lai, 2006, Random subspace method for multivariate feature selection, Pattern Recognition Letters, 27, 1067, 10.1016/j.patrec.2005.12.018 Lee, 2008, An integrated algorithm for gene selection and classification applied to microarray data of ovarian cancer, Artificial Intelligence in Medicine, 42, 81, 10.1016/j.artmed.2007.09.004 Linde, 1980, An algorithm for vector quantizer design, IEEE Transactions on Communications, 28, 84, 10.1109/TCOM.1980.1094577 Liu, 2002, Discretization: an enabling technique, Data Mining and Knowledge Discovery, 6, 393, 10.1023/A:1016304305535 L. Liu, J. Kang, J. Yu, Z. Wang, A comparative study on unsupervised feature selection methods for text clustering, in: IEEE International Conference on Natural Language Processing and Knowledge Engineering, 2005, pp. 597–601. Manning, 2008 Meyer, 2008, Information-theoretic feature selection in microarray data using variable complementarity, IEEE Journal of Selected Topics in Signal Processing (Special Issue on Genomic and Proteomic Signal Processing), 2, 261, 10.1109/JSTSP.2008.923858 Mitra, 2002, Unsupervised feature selection using feature similarity, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 301, 10.1109/34.990133 Peng, 2005, Feature selection based on mutual information: Criteri, of max-dependency, max-relevance and min-redundancy, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 1226, 10.1109/TPAMI.2005.159 Saeys, 2007, A review of feature selection techniques in bioinformatics, Bioinformatics, 23, 2507, 10.1093/bioinformatics/btm344 Statnikov, 2005, A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis, Bioinformatics, 21, 631, 10.1093/bioinformatics/bti033 Tsai, 2008, A discretization algorithm based on class-attribute contingency coefficient, Information Sciences, 178, 714, 10.1016/j.ins.2007.09.004 Vapnik, 1999 Webb, 2005, Not so naive Bayes: aggregating one-dependence estimators, Machine Learning, 58, 5, 10.1007/s10994-005-4258-6 Witten, 2005 Yan, 2009, A formal study of feature selection in text categorization, Journal of Communication and Computer, 6, 32 L. Yu, H. Liu, Feature selection for high-dimensional data: a fast correlation-based filter solution, in: Proceedings of International Conference on Machine Learning—ICML'03, 2003, pp. 856–863. Yu, 2004, Efficient feature selection via analysis of relevance and redundancy, Journal of Machine Learning Research, 5, 1205 Q. Zhu, L. Lin, M. Shyu, S. Chen, Effective supervised discretization for classification based on correlation maximization, in: IEEE International Conference on Information Reuse and Integration—IRI'2011, 2011, pp. 390–395. Zien, 2009, The feature importance ranking measure, vol. 5782, 694