Efficient ways to impute incomplete panel data
Tóm tắt
We find that existing multiple imputation procedures that are currently implemented in major statistical packages and that are available to the wide majority of data analysts are limited with regard to handling incomplete panel data. We review various missing data methods that we deem useful for the analysis of incomplete panel data and discuss, how some of the shortcomings of existing procedures can be overcome. In a simulation study based on real panel data, we illustrate these procedures’ quality and outline fruitful avenues of future research.
Từ khóa
Tài liệu tham khảo
Ackerman, B.P., Brown, E.D., Izard, C.E.: The relations between contextual risk, earned income, and the school adjustment of children from economically disadvantaged families. Dev. Psychol. 40(2), 204–216 (2004a)
Ackerman, B.P., Brown, E.D., Izard, C.E.: The relations between persistent poverty and contextual risk and children’s behavior in elementary school. Dev. Psychol. 40(3), 367–377 (2004b)
Allison, P.D.: Missing Data. Sage, Thousand Oaks (2001)
Bailey, L., Chapman, D.W., Kasprzyk, D.: Nonresponse adjustment procedures at the census bureau: A review. In: Proceedings of the Annual Research Conference, pp. 421–444, U.S. Bureau of the Census, Washington (1985)
Bingham, C.R., Crockett, L.J.: Longitudinal adjustment patterns of boys and girls experiencing early, middle, and late sexual intercourse. Dev. Psychol. 32(4), 647–658 (1996)
Bingham, C.R., Stemmler, M., Petersen, A.C., Graber, J.A.: Imputing missing data values in repeated measurement within-subjects designs. Methods Psychol. Res. Online 3(2), 131–155 (1998)
Bryk, A.S., Raudenbush, S.W.: Hierarchical Linear Models. Sage, Newbury Park (1992)
Carpenter, J., Kenward, M., Evans, S., White, I.: Last observation carryforward and last observation analysis. Stat. Med. 23, 3241–3244 (2004)
Chambers, J.M.: Software for Data Analysis: Programming with R. Springer, New York (2008)
Collins, L.M., Schafer, J.L., Kam, C.M.: A comparison of inclusive and restrictive missing-data strategies in modern missing-data procedures. Psychol. Methods 6, 330–351 (2001)
Cook, R.J., Zeng, L., Yi, G.Y.: Marginal analysis of incomplete longitudinal binary data: A cautionary note on LOCF imputation. Biometrics 60, 820–828 (2004)
Crockett, L.J., Bingham, C.R.: Anticipating adulthood: Expected timing of work and family transitions among rural youth. J. Res. Adolesc. 10(2), 151–172 (1996)
Davidov, E., Thörner, S., Schmidt, P., Gosen, S., Wolf, C.: Level and change of group-focused enmity in Germany: Unconditional and conditional latent growth curve models with four panel waves. Adv. Stat. Anal. (2011, this issue). doi:10.1007/s10182-011-0174-1
Everitt, B., Hothorn, T.: A Handbook of Statistical Analysis Using R. Chapman & Hall, Boca Raton (2006)
Ezzati-Rice, T.M., Johnson, W., Khare, M., Little, R.J.A., Rubin, D.B., Schafer, J.L.: A simulation study to evaluate the performance of model-based multiple imputations in NCHS health examination surveys. In: Proceedings of the Annual Research Conference, pp. 257–266, U.S. Bureau of the Census, Washington (1995)
Faraway, J.J.: Linear Models with R. Chapman & Hall, Boca Raton (2004)
Faraway, J.J.: Extending Linear Models with R. Chapman & Hall, Boca Raton (2006)
German, A., Hill, J.: Data Analysis Using Multilevel/Hierarchical Models. Cambridge University Press, Cambridge (2007)
Graham, J.W.: Adding missing-data-relevant variables to FIML-based structural equation models. Struct. Equ. Model. 10(1), 80–100 (2003)
Graham, J.W.: Missing data analysis: Making it work in the real world. Annu. Rev. Psychol. 60, 549–576 (2009)
Graham, J.W., Schafer, J.L.: On the performance of multiple imputation for multivariate data with small sample size. In: Hoyle, R. (ed.) Statistical Strategies for Small Sample Research, pp. 1–29. Sage, Thousand Oaks (1999)
Graham, J.W., Cumsille, P.E., Elek-Fisk, E.: Methods for handling missing data. In: Schinka, J.A., Velicer, W.F. (eds.) Handbook of Psychology: Volume 2. Research Methods in Psychology, pp. 87–114. Wiley, Hoboken (2003)
Horton, N.J., Kleinman, K.P.: Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models. Am. Stat. 61(1), 79–90 (2007)
Horton, N.J., Lipsitz, S.R.: Multiple imputation in practice: Comparison of software packages for regression models with missing variables. Am. Stat. 55, 244–254 (2001)
Kalton, G., Kasprzyk, D.: The treatment of missing survey data. Surv. Methodol. 12, 1–16 (1986)
Laird, N.M.: Missing data in longitudinal studies. Stat. Med. 7, 305–315 (1988)
Lally, J.R., Mangione, P.L., Honig, A.S.: The Syracuse University Family Development Research Program: Long-range impact of an early intervention with low-income children and their families. In: Powell, D.R. (ed.) Parent Education as Early Childhood Intervention: Emerging Directions in Theory, Research and Practice, pp. 79–104. Ablex, Norwood (1988)
Larsson, B., Possum, S., Clifford, G., Drugli, M.B., Handegård, B.H., Mørch, W.-T.: Treatment of oppositional defiant and conduct problems in young Norwegian children. Eur. Child Adolesc. Psych. 18(1), 42–52 (2008)
Little, R.J.A.: Missing-data adjustments in large surveys. J. Bus. Econ. Stat. 6(3), 287–296 (1988)
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1987)
Lösel, E., Beelmann, A., Stemmler, M.: Skalen zur Messung sozialen Problemverhaltens bei Vorschul- und Grundschulkindern. Die deutschen Versionen des Eyberg Child Behavior Inventory (ECBI) und des Social Behavior Questionnaire (SBQ) [unpublished manuscript]. University of Erlangen-Nürnberg, Department of Psychology (2002)
Lösel, F., Stemmler, M., Jaursch, S., Beelmann, A.: Universal prevention of antisocial development: Short- and long-term effects of a child- and parent-oriented program. Monatsschr. Kriminol. Strafrechtsreform 92, 289–307 (2009)
Lösel, R., Wüstendörfer, W.: Zum Problem unvollständiger Datenmatrizen in der empirischen Sozialforschung [The problem of missing data in social science research]. Köln. Z. Soziol. Soz.psychol. 26, 342–357 (1974)
Loukas, A., Fitzgerald, H.E., Zucker, R.A., von Eye, A.: Parental alcoholism and co-occurring antisocial behavior: Prospective relationships to externalizing behavior problems in their young sons. J. Abnorm. Child Psychol. 29(2), 91–106 (2001)
McArdle, J.J.: Longitudinal dynamic analyses of cognition in the health and retirement study panel. Adv. Stat. Anal. (2011, this issue). doi:10.1007/s10182-011-0168-z
McCord, J.: A thirty-year follow-up of treatment effects. Am. Psychol. 33, 284–289 (1978)
Muthén, L.K., Muthén, B.O.: Mplus User’s Guide, 6th edn. Muthén & Muthén, Los Angeles (2010)
Neyman, J.: Outline of a theory of statistical estimation based on the classical theory of probability. Philos. Trans. R. Soc. Lond. Ser. A 236, 333–380 (1937)
Neyman, J., Pearson, E.S.: On the problem of most efficient tests of statistical hypotheses. Philos. Trans. R. Soc. Lond. Ser. A 237, 289–337 (1933)
Raghunathan, T.E.: What do we do with missing data? Some options for analysis of incomplete data. Annu. Rev. Publ. Health 25, 99–117 (2004)
Raghunathan, T.E., Lepkowski, J.M., van Hoewyk, J., Solenberger, P.: A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv. Methodol. 27(1), 85–96 (2001)
Reinecke, J., Seddig, D.: Growth mixture models in longitudinal research. Adv. Stat. Anal. (2011, this issue). doi:10.1007/s10182-011-0171-4
Rubin, D.B.: Inference and missing data. Biometrika 63, 581–592 (1976)
Rubin, D.B.: Statistical matching using file concatenation with adjusted weights and multiple imputations. J. Bus. Econ. Stat. 4(1), 87–94 (1986)
Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman &Hall, London (1997a)
Schafer, J.L.: Imputation of missing covariates under a general linear mixed model. Technical Report 97-10, University Park: Pennsylvania State University, The Methodology Center (1997b)
Schafer, J.L., Graham, J.W.: Missing data: Our view of the state of the art. Psychol. Methods 7, 147–177 (2002)
Schafer, J.L., Olsen, M.K.: Multiple imputation for missing-data problems: A data analyst’s perspective. Multivar. Behav. Res. 33, 545–571 (1998)
Schafer, J.L., Yucel, R.M.: Computational strategies for multivariate linear mixed-effects models with missing values. J. Comput. Graph. Stat. 11(2), 437–457 (2002)
Seiffge-Krenke, L., Stemmler, M.: Coping with everyday stress and links to medical and psychosocial adaptation in diabetic adolescents. J. Adolesc. Health 33, 180–188 (2003)
Stemmler, M., Petersen, A.C.: Gender differential influences of early adolescent risk factors for the development of depressive affect. J. Youth Adolesc. 34(3), 175–183 (2005)
Tanner, M.A., Wong, W.H.: The calculation of posterior distributions by data augmentation (with discussion). J. Am. Stat. Assoc. 82, 528–550 (1987)
Tremblay, R.E., Desmarais-Gervais, L., Gagnon, C., Charlebois, P.: The preschool behavior questionnaire. Stability of its factor structure between cultures, sexes, ages and socioeconomic classes. Int. J. Behav. Dev. 10, 467–484 (1987)
Tremblay, R.E., Loeber, R., Gagnon, C., Charlebois, R., Larive, S., LeBlanc, M.: Disruptive boys with stable and unstable high fighting behavior patterns during junior elementary school. J. Abnorm. Child Psychol. 19(3), 285–300 (1991)
Tremblay, R.E., Vitaro, E., Gagnon, C., Piche, C., Royer, N.: A prosocial scale for the preschool behavior questionnaire: Concurrent and predictive correlates. Int. J. Behav. Dev. 15(2), 227–245 (1992)
van Buuren, S.: Multiple imputation of discrete and continuous data by fully conditional specification. Stat. Methods Med. Res. 16(3), 219–242 (2007)
van Buuren, S., Groothuis-Oudshoorn, K.: MICE: Multivariate imputation by chained equations in R. J. Stat. Softw. (2011, forthcoming). Available from http://www.stefvanbuuren.nl/publications/MICE%20in%20R%20-%20Draft.pdf
van Buuren, S., Brand, J.P.L., Groothuis-Oudshoorn, C.G.M., Rubin, D.B.: Fully conditional specication in multivariate imputation. J. Stat. Comput. Simul. 76(12), 1049–1064 (2006)
Weins, C., Reinecke, J.: Delinquenzverläufe im Jugendalter: Eine methodologische Analyse zur Auswirkung von fehlenden Werten im Längsschnitt [Development of juvenile delinquency: An analysis of the effects of missing data]. Monatsschr. Kriminol. Strafrechtsreform 90(5), 418–437 (2007)
Yu, L.M., Burton, A., Rivero-Arias, O.: Evaluation of software for multiple imputation of semi-continuous data. Stat. Methods Med. Res. 16, 243–258 (2007)