Prediction or interpretability?

Emerging Themes in Epidemiology - Tập 16 - Trang 1-3 - 2019
Stefano Nembrini1
1Department of Pathology, Immunology and Laboratory Medicine, College of Medicine, Emerging Pathogens Institute, University of Florida, Gainesville, USA

Tóm tắt

The journal published a review of the literature on recursive partition in epidemiological research comparing two decision tree methods: classification and regression trees (CARTs) and conditional inference trees (CITs). There are two sources of potential confusion in the paper for readers: one lies in the definition and the comparison of CITs and CARTs, while the other is more general and it refers to the use of hyper-parameters and their tuning through resampling techniques.

Tài liệu tham khảo

Venkatasubramaniam A, Wolfson J, Mitchell N, Barnes T, JaKa M, French S. Decision trees in epidemiological research. Emerg Themes Epidemiol. 2017;14(1):11. Hothorn T, Hornik K, Zeileis A. Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat. 2006;15:651–74. Breiman L, Friedman J, Olshen RA, Stone CJ. Classification and regression trees. Boca Raton: CRC Press; 1984. Strobl C. Statistical sources of variable selection bias in classification tree algorithms based on the Gini index. Technical report, Discussion paper//Sonderforschungsbereich 386 der Ludwig-Maximilians; 2005. Breiman L. Classification and regression trees. London: Routledge; 2017. Sandri M, Zuccolotto P. Analysis and correction of bias in total decrease in node impurity measures for tree-based algorithms. Stat Comput. 2010;20(4):393–407. Goodman SN. Toward evidence-based medical statistics. 1: the p value fallacy. Ann Intern Med. 1999;130(12):995–1004. Gigerenzer G. Statistical rituals: the replication delusion and how we got there. Adv Methods Pract Psychol Sci. 2018;1(2):198–218. Cohen J. The earth is round. In: What if there were no significance tests? London: Routledge; 2016. p. 69–82. Goodman SN. P values, hypothesis tests, and likelihood: implications for epidemiology of a neglected historical debate. Am J Epidemiol. 1993;137(5):485–96. Gigerenzer G. Mindless statistics. J Socio-Econ. 2004;33(5):587–606. Nelder JA, Wedderburn RW. Generalized linear models. J R Stat Soc Ser A (General). 1972;135(3):370–84. Strasser H, Weber C. On the asymptotic theory of permutation statistics; 1999. Strobl C, Boulesteix A-L, Zeileis A, Hothorn T. Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinf. 2007;8:25. Wright MN, Dankowski T, Ziegler A. Unbiased split variable selection for random survival forests using maximally selected rank statistics. Stat Med. 2017;36:1272–84. Fisher RA. Statistical methods and scientific inference; 1956. Neyman J, Pearson ES. IX. On the problem of the most efficient tests of statistical hypotheses. Philos Trans R Soc Lond Ser A Contain Pap Math Phys Charact. 1993;231(694–706):289–337. Friedman J, Hastie T, Tibshirani R. The elements of statistical learning, vol. 1. Berlin: Springer; 2001.