A Review of Hot Deck Imputation for Survey Non‐response

International Statistical Review - Tập 78 Số 1 - Trang 40-64 - 2010
Rebecca Andridge1, Roderick J. A. Little2
1Division of Biostatistics, The Ohio State University, Columbus, OH 43210, USA
2Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USAE-mail: [email protected]

Tóm tắt

Summary Hot deck imputation is a method for handling missing data in which each missing value is replaced with an observed response from a “similar” unit. Despite being used extensively in practice, the theory is not as well developed as that of other imputation methods. We have found that no consensus exists as to the best way to apply the hot deck and obtain inferences from the completed data set. Here we review different forms of the hot deck and existing research on its statistical properties. We describe applications of the hot deck currently in use, including the U.S. Census Bureau's hot deck for the Current Population Survey (CPS). We also provide an extended example of variations of the hot deck applied to the third National Health and Nutrition Examination Survey (NHANES III). Some potential areas for future research are highlighted.

Từ khóa


Tài liệu tham khảo

Andridge R.R., 2009, The use of sample weights in hot deck imputation, J. Official Stat., 25, 21

Bailar J.C., 1978, ASA Proc. Section on Survey Res. Methods, 462

10.1111/j.1541-0420.2005.00377.x

Bankier M., 1994, ASA Proc. Section on Survey Res. Methods, 242

Bankier M., 1995, ASA Proc. Section on Survey Res. Methods, 287

Bankier M., 2000, Proceedings of the Second International Conference on Establishment Surveys, 571

10.1093/aje/kwh175

10.1111/j.1467-9868.2006.00555.x

10.1086/504276

Bowman K., 2005, 2003 NSDUH Methodological Resource Book

Breiman L., 1993, Classification and Regression Trees

10.1177/096228029600500302

Brick J.M., 2004, Variance estimation with hot deck imputation using a model, Surv. Methodol., 30, 57

Burns R.M., 1990, U.S. Bureau of the Census Proceedings of the Sixth Annual Research Conference, 655

Chen J., 1999, Inference with survey data imputed by hot deck when imputed values are nonidentifiable, Statist. Sinica., 9, 361

Chen J., 2000, Nearest neighbor imputation for survey data, J. Official. Stat., 16, 113

10.1198/016214501750332839

Cochran W.G., 1977, Sampling Techniques

Cotton C.(1991).Functional description of the generalized edit and imputation system. Tech. rep. Statistics Canada .

Cox B.G., 1980, ASA Proc. Section on Survey Res. Methods, 721

Cox B.G., 1981, ASA Proc. Section on Survey Res. Methods, 412

10.1080/01621459.1986.10478235

Efron B., 1994, Missing data, imputation, and the bootstrap, J. Amer. Statist. Assoc., 89, 463, 10.1080/01621459.1994.10476768

England A.M., 1994, ASA Proc. Section on Survey Res. Methods, 406

Ezzati‐Rice T.M., 1993, ASA Proc. Section on Survey Res. Methods, 292

Ezzati‐Rice T.M., 1993, ASA Proc. Section on Survey Res. Methods, 303

Fay R.E., 1993, ASA Proc. Section on Survey Res. Methods, 41

Fay R.E., 1996, Alternative paradigms for the analysis of imputed survey data, J. Amer. Statist. Assoc., 91, 490, 10.1080/01621459.1996.10476909

Fay R.E., 1999, ASA Proc. Section on Survey Res. Methods, 112

Fellegi I.P., 1976, A systematic approach to automatic edit and imputation, J. Amer. Statist. Assoc., 71, 17, 10.1080/01621459.1976.10481472

Ford B.L., 1983, Incomplete Data in Sample Surveys, 185

Grau E.A., 2004, ASA Proc. Section on Survey Res. Methods, 3588

10.1111/j.1751-5823.2006.00002.x

Haziza D., 2006, A nonresponse model approach to inference under imputation for missing survey data, Surv. Method., 32, 53

10.2307/2347902

Herzog T.N., 2009, Data Quality and Record Linkage Techniques

Judkins D.R., 1997, Proceedings of Statistics Canada Symposium 97

Judkins D.R., 1993, ASA Proc. Section on Survey Res. Methods, 458

Kalton G., 1986, The treatment of missing survey data, Surv. Method., 12, 1

10.2307/2986296

Khare M., 1993, ASA Proc. Section on Survey Res. Methods, 297

10.1093/biomet/89.2.470

10.1111/j.1467-9868.2006.00546.x

10.1093/biomet/91.3.559

Lazzeroni L.G., 1990, ASA Proc. Section on Survey Res. Methods, 260

Lillard L. Smith J.P.&Welch F.(1982).What do we really know about wages: The importance of non‐reporting and census imputation. Tech. rep. Rand Corporation Santa Monica CA .

10.1002/sim.2939

10.2307/1403140

10.2307/1391878

Little R.J.A., 2004, Robust likelihood‐based analysis of multivariate data with missing values, Statist. Sinica., 14, 949

10.1002/9781119013563

10.1002/sim.1513

Little R.J.A., 2005, Does weighting for nonresponse increase the variance of survey means, Surv. Method., 31, 161

Marker D.A., 2002, Survey Nonresponse, 329

10.1214/ss/1177010269

National Center for Education Statistics(2002).NCES statistical standards. Tech. rep. U.S. Department of Education .

Oh H.L., 1983, Incomplete Data in Sample Surveys, 143

Ono M., 1969, ASA Proc. Social Statistics Section, 277

10.1002/sim.1391

Platek R., 1983, Incomplete Data in Sample Surveys, 249

R Development Core Team, 2007, R: A Language and Environment for Statistical Computing

Raghunathan T.E., 2001, A multivariate technique for multiply imputing missing values using a sequence of regression models, Surv. Method., 21, 85

Rancourt E., 1999, ASA Proc. Section on Survey Res. Methods, 131

Rancourt E., 1994, ASA Proc. Section on Survey Res. Methods, 888

10.1080/01621459.1996.10476910

10.1093/biomet/79.4.811

10.1080/01621459.1994.10476818

10.1080/01621459.1995.10476493

10.1093/biomet/87.1.113

10.1093/biomet/63.3.581

Rubin D.B., 1978, ASA Proc. Section on Survey Res. Methods, 20

10.1214/aos/1176345338

10.2307/1391390

10.1002/9780470316696

10.1080/01621459.1996.10476908

10.1080/01621459.1986.10478280

Saigo H., 2001, A repeated half‐sample bootstrap and balanced repeated replications for randomly imputed data, Surv. Method., 27, 189

Särndal C.E., 1992, Methods for estimating the precision of survey estimates when imputation has been used, Surv. Method., 18, 241

10.1016/0167-9473(95)00057-7

Shao J., 1999, Approximate balanced half sample and repeated replication methods for imputed survey data, Sankhya Ser. B, 61, 187

10.1080/01621459.1998.10473733

10.1080/01621459.1996.10476997

10.1080/01621459.1999.10473841

10.1198/016214502760047078

10.1002/sim.3001

Srivastava M.S., 1986, The maximum likelihood method for non‐response in sample surveys, Surv. Method., 12, 61

10.1002/sim.2099

10.1016/S0895-4356(01)00476-0

U.S. Bureau of the Census(2002).Technical paper 63. Tech. rep. U.S. Government Printing Office .

U.S. Bureau of the Census, 2003, UN/ECE Work Session of Statistical Data Editing

U.S. Department of Health and Human Services(1994).Plan and operation of the third national health and nutrition examination survey 1988‐94. Tech. rep. National Center for Health Statistics Centers for Disease Control and Prevention .

U.S. Department of Health and Human Services(2001).Third national health and nutrition examination survey (nhanes iii 1988‐1994): Multiply imputed data set. cd‐rom series 11 no. 7a. Tech. rep. National Center for Health Statistics Centers for Disease Control and Prevention .

Van Buuren S.&Oudshoorn C.G.M.(1999).Flexible multivariate imputation by MICE. Tech. rep. TNO Prevention and Health Leiden .

Williams R.L., 1981, ASA Proc. Section on Survey Res. Methods, 406

10.1111/j.1541-0420.2008.01155.x