Power and sample size calculations in the presence of phenotype errors for case/control genetic association studies
Tóm tắt
Phenotype error causes reduction in power to detect genetic association. We present a quantification of phenotype error, also known as diagnostic error, on power and sample size calculations for case-control genetic association studies between a marker locus and a disease phenotype. We consider the classic Pearson chi-square test for independence as our test of genetic association. To determine asymptotic power analytically, we compute the distribution's non-centrality parameter, which is a function of the case and control sample sizes, genotype frequencies, disease prevalence, and phenotype misclassification probabilities. We derive the non-centrality parameter in the presence of phenotype errors and equivalent formulas for misclassification cost (the percentage increase in minimum sample size needed to maintain constant asymptotic power at a fixed significance level for each percentage increase in a given misclassification parameter). We use a linear Taylor Series approximation for the cost of phenotype misclassification to determine lower bounds for the relative costs of misclassifying a true affected (respectively, unaffected) as a control (respectively, case). Power is verified by computer simulation. Our major findings are that: (i) the median absolute difference between analytic power with our method and simulation power was 0.001 and the absolute difference was no larger than 0.011; (ii) as the disease prevalence approaches 0, the cost of misclassifying a unaffected as a case becomes infinitely large while the cost of misclassifying an affected as a control approaches 0. Our work enables researchers to specifically quantify power loss and minimum sample size requirements in the presence of phenotype errors, thereby allowing for more realistic study design. For most diseases of current interest, verifying that cases are correctly classified is of paramount importance.
Tài liệu tham khảo
Breslow NE, Day NE: Statistical Methods in Cancer Research. The Analysis of Case-Control Studies. 1980, Lyon, International Agency for Research on Cancer, 1: 350-Eighth
Ott J: Analysis of Human Genetic Linkage. 1999, Baltimore, The Johns Hopkins University Press
Page GP, George V, Go RC, Page PZ, Allison DB: "Are we there yet?": Deciding when one has demonstrated specific genetic causation in complex diseases and quantitative traits. Am J Hum Genet. 2003, 73: 711-719. 10.1086/378900.
Rice JP, Saccone NL, Rasmussen E: Definition of the phenotype. Adv Genet. 2001, 42: 69-76.
Bross I: Misclassification in 2 x 2 tables. Biometrics. 1954, 10: 478-486.
Gordon D, Levenstien MA, Finch SJ, Ott J: Errors and linkage disequilibrium interact multiplicatively when computing sample sizes for genetic case-control association studies. Pac Symp Biocomput. 2003, 490-501.
Gordon D, Finch SJ, Nothnagel M, Ott J: Power and sample size calculations for case-control genetic association tests when errors are present: application to single nucleotide polymorphisms. Hum Hered. 2002, 54: 22-33. 10.1159/000066696.
Kang SJ, Gordon D, Finch SJ: What SNP genotyping errors are most costly for genetic association studies?. Genet Epidemiol. 2004, 26: 132-141. 10.1002/gepi.10301.
Gordon D, Yang Y, Haynes C, Finch SJ, Mendell NR, Brown AM, Haroutunian V: Increasing power for tests of genetic association in the presence of phenotype and/or genotype error by use of double-sampling. Stat Appl Genet and Mol Biol. 2004, 3: Article 26-
Brown RP, Sweeney J, Frances A, Kocsis JH, Loutsch E: Age as a predictor of treatment response in endogenous depression. J Clin Psychopharmacol. 1983, 3: 176-178.
Appels A, Mulder P: Imminent myocardial infarction: a psychological study. J Human Stress. 1984, 10: 129-134.
Duffy SW, Rohan TE, Kandel R, Prevost TC, Rice K, Myles JP: Misclassification in a matched case-control study with variable matching ratio: application to a study of c-erbB-2 overexpression and breast cancer. Stat Med. 2003, 22: 2459-2468. 10.1002/sim.1477.
Jacobsen SJ, Roberts RO: Re: Effect of nonsteroidal anti-inflammatory agents and finasteride on prostate cancer risk. J Urol. 2003, 169: 1798-1799. 10.1097/01.ju.0000057804.01025.13.
Zheng G, Tian X: The impact of diagnostic error on testing genetic association in case-control studies. Stat Med. 2005, 24: 869-882. 10.1002/sim.1976.
Ioannidis JP, Trikalinos TA, Ntzani EE, Contopoulos-Ioannidis DG: Genetic associations in large versus small studies: an empirical assessment. Lancet. 2003, 361: 567-571. 10.1016/S0140-6736(03)12516-0.
Lohmueller KE, Pearce CL, Pike M, Lander ES, Hirschhorn JN: Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat Genet. 2003, 33: 177-182. 10.1038/ng1071.
Hirschhorn JN, Altshuler D: Once and again-issues surrounding replication in genetic association studies. J Clin Endocrinol Metab. 2002, 87: 4438-4441. 10.1210/jc.2002-021329.
Ioannidis JP: Genetic associations: false or true?. Trends Mol Med. 2003, 9: 135-138. 10.1016/S1471-4914(03)00030-3.
Lansbury PTJ: Back to the future: the 'old-fashioned' way to new medications for neurodegeneration. Nat Med. 2004, 10 Suppl: S51-7. 10.1038/nrn1435.
Press MF, Hung G, Godolphin W, Slamon DJ: Sensitivity of HER-2/neu antibodies in archival tissue samples: potential source of error in immunohistochemical studies of oncogene expression. Cancer Res. 1994, 54: 2771-2777.
Burd L, Kerbeshian J, Klug MG: Neuropsychiatric genetics: misclassification in linkage studies of phenotype-genotype research. J Child Neurol. 2001, 16: 499-504.
Mote VL, Anderson RL: An investigation of the effect of misclassification on the properties of chisquare-tests in the analysis of categorical data. Biometrika. 1965, 52: 95-109.
Gordon D, Ott J: Assessment and management of single nucleotide polymorphism genotype errors in genetic association analysis. Pac Symp Biocomput. 2001, 18-29.
Carroll RJ, Gail MH, Lubin JH: Case-control studies with errors in covariates. J Am Stat Assoc. 1993, 88: 185-199.
Mitra SK: On the limiting power function of the frequency chi-square test. Ann Math Stat. 1958, 29: 1221-1233.
Cochran WG: The chi-square test of goodness of fit. Ann Math Stat. 1952, 23: 315-345.
Tosteson TD, Buzas JS, Demidenko E, Karagas M: Power and sample size calculations for generalized regression models with covariate measurement error. Stat Med. 2003, 22: 1069-1082. 10.1002/sim.1388.
Sasieni PD: From genotypes to genes: doubling the sample size. Biometrics. 1997, 53: 1253-1261.
Czika W, Weir BS: Properties of the multiallelic trend test. Biometrics. 2004, 60: 69-74. 10.1111/j.0006-341X.2004.00166.x.
Box GEP, Hunter WG, Hunter JS: Statistics for Experimenters. Wiley series in probability and mathematical statistics. 1978, New York, John Wiley and Sons
Corder EH, Saunders AM, Strittmatter WJ, Schmechel DE, Gaskell PC, Small GW, Roses AD, Haines JL, Pericak-Vance MA: Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer's disease in late onset families. Science. 1993, 261: 921-923.
Sloane PD, Zimmerman S, Suchindran C, Reed P, Wang L, Boustani M, Sudha S: The public health impact of Alzheimer's Disease, 2000-2050: potential implication of treatment advances. Annu Rev Public Health. 2002, 23: 213-231. 10.1146/annurev.publhealth.23.100901.140525.