Overfitting in prediction models – Is it a problem only in high dimensions?
Tài liệu tham khảo
Simon, 2012, Clinical trials for predictive medicine, Stat Med, 31, 3031, 10.1002/sim.5401
Simon, 2003, Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification, J Natl Cancer Inst, 95, 14, 10.1093/jnci/95.1.14
Ambroise, 2002, Selection bias in gene extraction on the basis of microarray gene-expression data, Proc Natl Acad Sci USA, 99, 6562, 10.1073/pnas.102102699
Harrell, 1996, Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors, Stat Med, 15, 361, 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4
Hastie, 2009
Concato, 1993, The risk of determining risk with multivariable models, Ann Intern Med, 118, 201, 10.7326/0003-4819-118-3-199302010-00009
Babyak, 2004, What you see may not be what you get: a brief, nontechnical introduction to overfitting in regression-type models, Psychosom Med, 66, 411
Concato, 1995, Importance of events per independent variable in proportional hazards analysis. I. Background, goals, and general strategy, J Clin Epidemiol, 48, 1495, 10.1016/0895-4356(95)00510-2
Peduzzi, 1995, Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates, J Clin Epidemiol, 48, 1503, 10.1016/0895-4356(95)00048-8
Simon, 2004
Dudoit, 2002, Comparison of discrimination methods for the classification of tumors using gene-expression data, J Am Stat Assoc, 97, 77, 10.1198/016214502753479248
Tibshirani, 2002, Diagnosis of multiple cancer types by shrunken centroids of gene expression, Proc Natl Acad Sci U S A, 99, 6567, 10.1073/pnas.082099299
Core Team, 2012, R: A language and environment for statistical computing
Maechler
Hastie