Comparison of missing data approaches in linkage analysis

BMC Genetics - Tập 4 - Trang 1-4 - 2003
Chao Xing1, Fredrick R Schumacher1, David V Conti2, John S Witte1,3
1Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, USA
2Department of Preventive Medicine, University of Southern California, Los Angeles, USA
3Department of Epidemiology and Biostatistics, and Urology, University of California, San Francisco, USA

Tóm tắt

Observational cohort studies have been little used in linkage analyses due to their general lack of large, disease-specific pedigrees. Nevertheless, the longitudinal nature of such studies makes them potentially valuable for assessing the linkage between genotypes and temporal trends in phenotypes. The repeated phenotype measures in cohort studies (i.e., across time), however, can have extensive missing information. Existing methods for handling missing data in observational studies may decrease efficiency, introduce biases, and give spurious results. The impact of such methods when undertaking linkage analysis of cohort studies is unclear. Therefore, we compare here six methods of imputing missing repeated phenotypes on results from genome-wide linkage analyses of four quantitative traits from the Framingham Heart Study cohort. We found that simply deleting observations with missing values gave many more nominally statistically significant linkages than the other five approaches. Among the latter, those with similar underlying methodology (i.e., imputation- versus model-based) gave the most consistent results, although some discrepancies remained. Different methods for addressing missing values in linkage analyses of cohort studies can give substantially diverse results, and must be carefully considered to protect against biases and spurious findings.

Tài liệu tham khảo

Kelsey JL, Whittemore AS, Evans AS, Thompson WD: Methods in Observational Epidemiology. 1996, New York, Oxford University Press Rothman KJ, Greenland S: Modern Epidemiology. 1998, Philadelphia, Lippincott-Raven Publishers, 207-208. Touloumi G, Babiker AG, Pocock SJ, Darbyshire JH: Impact of missing data due to drop-outs on estimators for rates of change in longitudinal studies: a simulation method. Stat Med. 2001, 20: 3715-3728. 10.1002/sim.1114. Little RJA, Rubin DB: Statistical Analysis with Missing Data. 2002, New Jersey, John Wiley & Sons, Inc, 12-20. Schafer JL: Analysis of Incomplete Multivariate Data. 1997, New York, Chapman & Hall, 11-12. Greenland S, Finkle WD: A critical look at methods for handling missing covariates in epidemiologic regression analysis. Am J Epidemiol. 1995, 142: 1225-1264. Schimert J, Schafer JL, Hesterberg T, Fraley C, Clarkson DB: Analyzing Data with Missing Values in S-PLUS. Washington, Insightful Corporation. 2001 Elston RC, Buxbaum S, Jacobs KB, Olson JM: Haseman and Elston revisited. Genet Epidemiol. 2000, 19: 1-17. 10.1002/1098-2272(200007)19:1<1::AID-GEPI1>3.0.CO;2-E. Case Western Reserve University: S.A.G.E. Statistical Analysis for Genetic Epidemiology, Release 4.2. Cleveland, Ohio, Department of Epidemiology and Biostatistics, Case Western Reserve University. 2002 Barnard J, Rubin DB: Small-sample degrees of freedom with multiple imputation. Biometrika. 1999, 86: 948-955. 10.1093/biomet/86.4.948.