Unbiased split selection for classification trees based on the Gini Index

Computational Statistics and Data Analysis - Tập 52 - Trang 483-501 - 2007
Carolin Strobl1, Anne-Laure Boulesteix2, Thomas Augustin1
1Department of Statistics, University of Munich, Ludwigstr. 33, 80539 Munich, Germany
2Department of Medical Statistics and Epidemiology, Technical University of Munich, Ismaningerstr. 22, 81675 Munich, Germany

Tài liệu tham khảo

Benjamini, 1995, Controlling the false discovery rate: a practical and powerful approach to multiple testing, J. Roy. Statist. Soc. B, 57, 289 Bittencourt, 2004, Feature selection by using classification and regression trees, 66 Boulesteix, 2006, Maximally selected chi-square statistics and binary splits of nominal variables, Biometrical J., 48, 838, 10.1002/bimj.200510191 Boulesteix, 2006, Maximally selected chi-square statistics for ordinal variables, Biometrical J., 48, 451, 10.1002/bimj.200510161 Boulesteix, 2006, Identification of interaction patterns and classification with applications to microarray data, Comput. Statist. Data Anal., 50, 783, 10.1016/j.csda.2004.10.004 Breiman, 2001, Random forests, Mach. Learn., 45, 5, 10.1023/A:1010933404324 Breiman, 1984 Dobra, 2001, Bias correction in classification tree construction, 90 Dong, 1999, Efficient mining of emerging patterns: discovering trends and differences, 43 Evans, 1993 Hawkins, D.M., 1997. Firm: formal inference-based recursive modeling. Release 2.1, Technical Report 546, School of Statistics, University of Minnesota, MN, USA. Hothorn, 2006, Unbiased recursive partitioning: a conditional inference framework, J. Comput. Graph. Statist., 15, 651, 10.1198/106186006X133933 Jong, 2005, Estimating neuronal variable importance with random forest, 33 Kass, 1980, An exploratory technique for investigating large quantities of categorical data, Appl. Statist., 29, 119, 10.2307/2986296 Kim, 2001, Classification trees with unbiased multiway splits, J. Amer. Statist. Assoc., 96, 589, 10.1198/016214501753168271 Kononenko, 1995, On biases in estimating multi-valued attributes, 1034 Koziol, 1991, On maximally selected chi-square statistics, Biometrics, 47, 1557, 10.2307/2532406 Little, 1986 Little, 2002 Liu, 1997, Techniques for dealing with missing values in classification, 527 Loh, 2002, Regression trees with unbiased variable selection and interaction detection, Statist. Sinica, 12, 361 Loh, 1997, Split selection methods for classification trees, Statist. Sinica, 7, 815 Miller, 1982, Maximally selected Chi square statistics, Biometrics, 38, 1011, 10.2307/2529881 Quinlan, 1986, Induction of decision trees, Mach. Learn., 1, 81, 10.1007/BF00116251 Quinlan, 1993 R Development Core Team, 2006. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 〈http://www.R-project.org〉. Schmaußer, M., 2005. Auswirkungen verschiedener Stoffwechsellagen auf die Fertilität beim Milchrind unter besonderer Berücksichtigung der individuellen Futteraufnahme und unter Berücksichtigung verschiedener Melksysteme. Ph.D. Thesis, Faculty of Veterinary Medicine, University of Munich LMU, Munich, Germany. Shih, 2004, A note on split selection bias in classification trees, Comput. Statist. Data Anal., 45, 457, 10.1016/S0167-9473(03)00064-1 Shih, 2004, Variable selection bias in regression trees with constant fits, Comput. Statist. Data Anal., 45, 595, 10.1016/S0167-9473(03)00036-7 Strobl, C., 2005. Variable selection in classification trees based on imprecise probabilities. In: Cozman, F., Nau, R., Seidenfeld, T. (Eds.), Proceedings of the Fourth International Symposium on Imprecise Probabilities and their Applications, Carnegy Mellon University, Pittsburgh, PA, USA, SIPTA, Manno, pp. 340–348. Strobl, C., Boulesteix, A.-L., Zeileis, A., Hothorn, T., 2006. Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics, accepted for publication. White, 1994, Bias in information based measures in decision tree induction, Mach. Learn., 15, 321, 10.1007/BF00993349