Comparison of Random Forest and Parametric Imputation Models for Imputing Missing Data Using MICE: A CALIBER Study

American Journal of Epidemiology - Tập 179 Số 6 - Trang 764-774 - 2014
Anoop D Shah, Jonathan Bartlett1, James R. Carpenter, Owen Nicholas, Harry Hemingway
1Department of Mathematical Sciences

Tóm tắt

Từ khóa


Tài liệu tham khảo

Marston, 2010, Issues in multiple imputation of missing data for large general practice clinical databases, Pharmacoepidemiol Drug Saf, 19, 618, 10.1002/pds.1934

Schafer, 1997, Analysis of Incomplete Multivariate Data, 10.1201/9781439821862

Little, 2002, Statistical Analysis With Missing Data, 2nd ed, 10.1002/9781119013563

van Buuren, 2011, mice: Multivariate Imputation by Chained Equations in R, J Stat Softw, 45, 1

Seaman, 2012, Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods, BMC Med Res Methodol, 12, 46, 10.1186/1471-2288-12-46

Hardt, 2012, Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research, BMC Med Res Methodol, 12, 184, 10.1186/1471-2288-12-184

Burgette, 2010, Multiple imputation for missing data via sequential regression trees, Am J Epidemiol, 172, 1070, 10.1093/aje/kwq260

Breiman, 2001, Random forests, Mach Learn, 45, 5, 10.1023/A:1010933404324

Breiman, 2002, Manual on Setting Up, Using, and Understanding Random Forests V3.1

Dasgupta, 2011, Brief review of regression-based and machine learning methods in genetic epidemiology: the Genetic Analysis Workshop 17 experience, Genet Epidemiol, 35, S5, 10.1002/gepi.20642

Ishwaran, 2004, Relative risk forests for exercise heart rate recovery as a predictor of mortality, J Am Stat Assoc, 99, 591, 10.1198/016214504000000638

Ishwaran, 2008, Random survival forests, Ann Appl Stat, 2, 841, 10.1214/08-AOAS169

Tsuji, 2012, Potential responders to FOLFOX therapy for colorectal cancer by random forests analysis, Br J Cancer, 106, 126, 10.1038/bjc.2011.505

Stekhoven, 2012, MissForest—non-parametric missing value imputation for mixed-type data, Bioinformatics, 28, 112, 10.1093/bioinformatics/btr597

Eisemann, 2011, Imputation of missing values of tumour stage in population-based cancer registration, BMC Med Res Methodol, 11, 129, 10.1186/1471-2288-11-129

Denaxas, 2012, Data resource profile: CArdiovascular disease research using LInked BEspoke studies and electronic Records (CALIBER), Int J Epidemiol, 41, 1625, 10.1093/ije/dys188

Shah, 2013, CALIBERrfimpute: Imputation in MICE using Random Forest

Herrett, 2010, Validation and validity of diagnoses in the General Practice Research Database: a systematic review, Br J Clin Pharmacol, 69, 4, 10.1111/j.1365-2125.2009.03537.x

Health and Social Care Information Centre, 2013, Hospital Episode Statistics

Herrett, 2010, The Myocardial Ischaemia National Audit Project (MINAP), Heart, 96, 1264, 10.1136/hrt.2009.192328

Shah, 2011, Threshold haemoglobin levels and the prognosis of stable coronary disease: two new cohorts and a systematic review and meta-analysis, PLoS Med, 8, e1000439, 10.1371/journal.pmed.1000439

Guasti, 2011, Neutrophils and clinical outcomes in patients with acute coronary syndromes and/or cardiac revascularization: a systematic review on more than 34,000 subjects, Thromb Haemost, 106, 591, 10.1160/TH11-02-0096

Núñez, 2011, Low lymphocyte count and cardiovascular diseases, Curr Med Chem, 18, 3226, 10.2174/092986711796391633

Hertz-Picciotto, 1997, Validity and efficiency of approximation methods for tied survival times in Cox regression, Biometrics, 53, 1151, 10.2307/2533573

White, 2009, Imputing missing covariate values for the Cox model, Stat Med, 28, 1982, 10.1002/sim.3618

Barnard, 1999, Small-sample degrees of freedom with multiple imputation, Biometrika, 86, 948, 10.1093/biomet/86.4.948

R Development Core Team, 2010, R: A Language and Environment for Statistical Computing

Stekhoven, 2012, missForest: Nonparametric Missing Value Imputation using Random Forest

Therneau, 2010, survival: Survival Analysis, Including Penalised Likelihood

Liaw, 2002, Classification and regression by randomForest, R News, 2, 18

Matsumoto, 1998, Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator, ACM Trans Model Comput Simul, 8, 3, 10.1145/272991.272995

Bartlett, 2012, Multiple Imputation of Covariates by Fully Conditional Specification: Accommodating the Substantive Model

Mendez, 2011, Estimating residual variance in random forest regression, Comput Stat Data Anal, 55, 2937, 10.1016/j.csda.2011.04.022

Rubin, 1996, Multiple imputation after 18+ years, J Am Stat Assoc, 91, 473, 10.1080/01621459.1996.10476908

Marshall, 2010, Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study, BMC Med Res Methodol, 10, 112, 10.1186/1471-2288-10-112

Carpenter, 2011, REALCOM-IMPUTE software for multilevel multiple imputation with mixed response types, J Stat Softw, 45, 1, 10.18637/jss.v045.i05