Multiple imputation: review of theory, implementation and software

Statistics in Medicine - Tập 26 Số 16 - Trang 3057-3077 - 2007
Ofer Harel1, Xiao‐Hua Zhou2,3
1Department of Statistics, University of Connecticut, 215 Glenbrook Road Unit 4120 Storrs, CT 06269-4120, U.S.A.
2Department of Biostatistics, School of Public Health, University of Washington, F600 Health Sciences, Box 357232, Seattle, WA 98195-7232, U.S.A.
3HSR&D Center of Excellence, VA Puget Sound Health Care System, 1660 South Columbian Way, 1/424, Seattle, WA 98108, U.S.A.

Tóm tắt

Abstract

Missing data is a common complication in data analysis. In many medical settings missing data can cause difficulties in estimation, precision and inference. Multiple imputation (MI) (Multiple Imputation for Nonresponse in Surveys. Wiley: New York, 1987) is a simulation‐based approach to deal with incomplete data. Although there are many different methods to deal with incomplete data, MI has become one of the leading methods. Since the late 1980s we observed a constant increase in the use and publication of MI‐related research. This tutorial does not attempt to cover all the material concerning MI, but rather provides an overview and combines together the theory behind MI, the implementation of MI, and discusses increasing possibilities of the use of MI using commercial and free software. We illustrate some of the major points using an example from an Alzheimer disease (AD) study. In this AD study, while clinical data are available for all subjects, postmortem data are only available for the subset of those who died and underwent an autopsy. Analysis of incomplete data requires making unverifiable assumptions. These assumptions are discussed in detail in the text. Relevant S‐Plus code is provided. Copyright © 2007 John Wiley & Sons, Ltd.

Từ khóa


Tài liệu tham khảo

Little RJA, 1987, Statistical Analysis with Missing Data

10.1002/9780470316696

10.1080/01621459.1996.10476908

10.1201/9781439821862

10.4135/9781412985079

10.1177/096228029900800102

10.1111/1467-9574.00218

10.1214/aos/1176351053

10.1093/biomet/87.1.113

10.1093/biomet/86.4.948

10.1214/009053604000000175

10.1037/1082-989X.6.4.330

10.1214/ss/1177010269

10.1002/sim.1475

10.2307/2291315

10.1093/biomet/63.3.581

10.2307/2986113

10.2307/2986113

Heckman J, 1976, The common structure of statistical models of truncation, sample selection and limited dependent variables, and a simple estimator for such models, Annals of Economic and Social Measurement, 5, 475

10.1016/0304-4076(84)90074-5

10.1002/(SICI)1097-0258(19970215)16:3<259::AID-SIM484>3.0.CO;2-S

10.1002/sim.1728

10.1080/01621459.1993.10594302

10.2307/2533322

10.1111/j.0006-341X.2000.00667.x

10.2307/2290525

Li K‐H, 1991, Significance levels from repeated p‐values with multiply‐imputed data, Statistica Sinica, 1, 65

10.1093/biomet/79.1.103

10.1080/01621459.1987.10478458

10.1109/TPAMI.1984.4767596

10.1080/01621459.1986.10478280

10.2307/2347902

10.1002/sim.689

10.2307/2289460

BeatonAE.The use of special matrix operations in statistical calculus. Research Bulletin RB‐64‐51 Princeton NJ 1964. Educational Testing Service.

10.1214/aoms/1177705052

SchaferJL. Multiple Imputation with PAN (Software).2000.

10.2307/2529876

Schimert J, 2001, Analyzing Missing Values in S‐PLUS

Dempster AP, 1977, Maximum likelihood estimation from incomplete data via the EM algorithm (with Discussion), Journal of the Royal Statistical Society, Series B, 39, 1

YuanYC.Multiple imputation for missing data: concepts and new developments. Proceedings of the Twenty‐Fifth Annual SAS Users Group International Conference Paper 267 2000.

SOLAS, 2001, SOLAS

10.1016/0006-3223(96)84228-4

StataCorp. Stata Statistical Software: Release 8. StataCorp LP College Station TX 2003. Software.

van BuurenS OudshoornCGM.Flexible multivariate imputation by mice. Leiden: TNO Preventie en Gezondheid TNO/VGZ/PG 99.054 1999.

10.1002/(SICI)1097-0258(19990330)18:6<681::AID-SIM71>3.0.CO;2-R

Schafer JL, 1999, NORM: Multiple Imputation of Incomplete Multivariate Data Under a Normal Model, Version 2

Schafer JL, 1997, Technical Report

Raghunathan TE, 2000, IVEware: Imputation and Variance Estimation Software Installation instruction and User Guide

Raghunathan TE, 2001, A multivariate technique for multiply imputing missing values using a sequence of regression models, Survey Methodology, 27, 85

Graham JW, 1993, EMCOV.EXE User's Guide ( Computer Program and Manual)

Honaker J, 2001, Amelia: A Program of Missing Data (Windows Version)

10.1017/S0003055401000235

10.1037/1082-989X.10.1.84

Collins LM, 1999, WinLTA User's Guide Part 1

Collins LM, 2001, WinLTA User's Guide for Data Augmentation

10.1198/000313001317098266

10.1002/sim.2494

RubinDB SchenkerN.Logit‐based interval estimation for binomial data using the Jeffreys prior. Sociological Methodology1987;131–144.

10.2307/2530820