Multiple imputation of discrete and continuous data by fully conditional specification
Tóm tắt
The goal of multiple imputation is to provide valid inferences for statistical estimates from incomplete data. To achieve that goal, imputed values should preserve the structure in the data, as well as the uncertainty about this structure, and include any knowledge about the process that generated the missing data. Two approaches for imputing multivariate data exist: joint modeling (JM) and fully conditional specification (FCS). JM is based on parametric statistical theory, and leads to imputation procedures whose statistical properties are known. JM is theoretically sound, but the joint model may lack flexibility needed to represent typical data features, potentially leading to bias. FCS is a semi-parametric and flexible alternative that specifies the multivariate model by a series of conditional models, one for each incomplete variable. FCS provides tremendous flexibility and is easy to apply, but its statistical properties are difficult to establish. Simulation work shows that FCS behaves very well in the cases studied. The present paper reviews and compares the approaches. JM and FCS were applied to pubertal development data of 3801 Dutch girls that had missing data on menarche (two categories), breast development (five categories) and pubic hair development (six stages). Imputations for these data were created under two models: a multivariate normal model with rounding and a conditionally specified discrete model. The JM approach introduced biases in the reference curves, whereas FCS did not. The paper concludes that FCS is a useful and easily applied flexible alternative to JM when no convenient and realistic joint distribution can be specified.
Từ khóa
Tài liệu tham khảo
Dempster AP, 1977, Statistical Methodology, 39, 1
Stern HS, 2001, Psychological Methods, 6, 317
Abraham WT, 2004, Psychiatry, 17, 315
Molenberghs G., 1999, Revue d'Epidemiologie et de Sante Publique, 47, 499
Walczak B., 2001, Systems, 58, 29
Meng XL, 1995, Statistical Science, 10, 538
Abayomi K., 2005, Diagnostics for multivariate imputations
Little Rja., 1988, Journal of Business Economics and Statistics, 6, 287, 10.1080/07350015.1988.10509663
Brand Jpl., 1999, Development, implementation and evaluation of multiple imputation strategies for the statistical analysis of incomplete data sets
Raghunathan TE, 2001, Survey Methodology, 27, 85
Heckman JJ, 1976, Annals of Economic and Social Measurement, 5, 475
Pan W., 2001, Analysis, 7, 111
Bechger TM, 2002, Genetics, 32, 145
Heeringa SG, 2002, Multivariate imputation of coarsened survey data on household wealth
Rubin DB, 1990 Proceedings of the Statistical Computing Section
Kennickell AB, 1991, Proceedings of the Section on Survey Research Methods, 1
Heckerman D., 2001, Journal of Machine Learning Research, 1, 49
Van Buuren S, 2000, Life
Arnold BC, 1999, Conditional specification of statistical models
Besag J., 1974, Statistical Methodology, 36, 192
Gelman A., 1993, Statistical Methodology, 55, 185
Little Rja., 1992, Journal of the American Statistical Association, 87, 1227
Hastie TJ, 1990, Generalized additive models
Ake CF, 2005, Proceedings, 112
Allison PD, 2005, SUGI 30 Proceedings, 113
Gelman A., 2001, Statistical Science, 16, 249