Improved Semiparametric Analysis of Polygenic Gene–Environment Interactions in Case–Control Studies

Statistics in Biosciences - Tập 13 - Trang 386-401 - 2020
Tianying Wang1, Alex Asher2
1Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing, China
2StataCorp LLC, College Station, USA

Tóm tắt

Standard logistic regression analysis of case–control data has low power to detect gene–environment interactions, but until recently it was the only method that could be used on complex polygenic data for which parametric distributional models are not feasible. Under the assumption of gene–environment independence in the underlying population, Stalder et al. (Biometrika, 104:801–812, 2017) developed a retrospective method that treats both genetic and environmental variables nonparametrically. However, the mathematical symmetry of genetic and environmental variables is overlooked. We propose an improvement to the method of Stalder et al. that increases the efficiency of the estimates with no additional assumptions and modest computational cost. This improvement is achieved by treating the genetic and environmental variables symmetrically to generate two sets of parameter estimates that are combined to generate a more efficient estimate. We employ a semiparametric framework to develop the asymptotic theory of the estimator, show its asymptotic efficiency gain, and evaluate its performance via simulation studies. The method is illustrated using data from a case–control study of breast cancer.

Tài liệu tham khảo

Andersen SW, Trentham-Dietz A, Gangnon RE, Hampton JM, Skinner HG, Engelman CD, Klein BE, Titus LJ, Egan KM, Newcomb PA (2014) Breast cancer susceptibility loci in association with age at menarche, age at natural menopause and the reproductive lifespan. Cancer Epidemiol 38:62–65 Anderson WF, Matsuno RK, Sherman ME, Lissowska J, Gail MH, Brinton LA, Yang XR, Peplonska B, Chen BE, Rosenberg PS, Chatterjee N, Szeszenia-Dabrowska N, Bardin-Mikolajczak A, Zatonski W, Devesa SS, García-Closas M (2007) Estimating age-specific breast cancer risks: a descriptive tool to identify age interactions. Cancer Causes Control 18:439–447 Breslow NE, Robins JM, Wellner JA (2000) On the semi-parametric efficiency of logistic regression under case–control sampling. Bernoulli 6:447–55 Canzian F, Cox DG, Setiawan VW, Stram DO, Ziegler RG, Dossus L, Beckmann L, Blanché H, Barricarte A, Berg CD et al (2010) Comprehensive analysis of common genetic variation in 61 genes related to steroid hormone and insulin-like growth factor-i metabolism and breast cancer risk in the NCI breast and prostate cancer cohort consortium. Hum Mol Genet 19:3873–3884 Chatterjee N, Carroll RJ (2005) Semiparametric maximum likelihood estimation in case–control studies of gene–environment interactions. Biometrika 92:399–418 Chatterjee N, Kalaylioglu Z, Moslehi R, Peters U, Wacholder S (2006) Powerful multilocus tests of genetic association in the presence of gene-gene and gene–environment interactions. Am J Hum Genet 79:1002–1016 Chatterjee N, Wheeler B, Sampson J, Hartge P, Chanock SJ, Park J-H (2013) Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nat Genet 45:400–405 Chatterjee N, Shi J, García-Closas M (2016) Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat Rev Genet 17:392–406 Consortium TGP (2015) A global reference for human genetic variation. Nature 526:68–74 Dudbridge F (2013) Power and predictive accuracy of polygenic risk scores. PLoS Genet 9:e1003348 Elks CE, Perry JRB, Sulem P, Chasman DI, Franceschini N, He C, Lunetta KL, Visser JA, Byrne EM, Cousminer DL et al (2010) Thirty new loci for age at menarche identified by a meta-analysis of genome-wide association studies. Nat Genet 42:1077–1085 Fuchsberger C, Flannick J, Teslovich TM et al (2016) The genetic architecture of type 2 diabetes. Nature 536:41–47. https://doi.org/10.1038/nature18642 Gail MH (2008) Discriminatory accuracy from single-nucleotide polymorphisms in models to predict breast cancer risk. JNCI 100:1037–1041 Gibbs RA, Belmont JW, Hardenbol P, Willis TD, Yu F, Yang H, Ch’ang L-Y, Huang W, Liu B, Shen Y et al (2003) The International hapmap Project. Nature 426:789–796 Gustavsson J, Mehlig K, Leander K, Berg C, Tognon G, Strandhagen E, Björck L, Rosengren A, Lissner L, Nyberg F (2016) Fto gene variation, macronutrient intake and coronary heart disease risk: a gene-diet interaction analysis. Eur J Nutr 55:247–255 Han SS, Rosenberg PS, Garcia-Closas M, Figueroa JD, Silverman D, Chanock SJ, Rothman N, Chatterjee N (2012) Likelihood ratio test for detecting gene (g)-environment (e) interactions under an additive risk model exploiting ge independence for case-control data. Am J Epidemiol 176:1060–1067 Jiao S, Hsu L, Bézieau S, Brenner H, Chan AT, Chang-Claude J, Le Marchand L, Lemire M, Newcomb PA, Slattery ML et al (2013) SBERIA: set-based gene–environment interaction test for rare and common variants in complex diseases. Genet Epidemiol 37:452–464 Krischer JP, Lynch KF, Lernmark Å, Hagopian WA, Rewers MJ, She J-X, Toppari J, Ziegler A-G, Akolkar B, Group TS et al (2017) Genetic and environmental interactions modify the risk of diabetes-related autoimmunity by 6 years of age: the teddy study. Diabetes Care 40:1194–1202 Kwee LC, Epstein MP, Manatunga AK, Duncan R, Allen AS, Satten GA (2007) Simple methods for assessing haplotype–environment interactions in case-only and case–control studies. Genet Epidemiol 31:75–90 Liang L, Ma Y, Carroll RJ (2019) A semiparametric efficient estimator in case–control studies for gene–environment independent models. J Multivariate Anal 173:38–50 Lin DY, Zeng D (2006) Likelihood-based inference on haplotype effects in genetic association studies. J Am Stat Assoc 101:89–104 Lin X, Lee S, Wu MC, Wang C, Chen H, Li Z, Lin X (2015) Test for rare variants by environment interactions in sequencing association studies. Biometrics 72:156–164 Lobach I, Carroll RJ, Spinka C, Gail MH, Chatterjee N (2008) Haplotype-based regression analysis of case–control studies with unphased genotypes and measurement errors in environmental exposures. Biometrics 64:673–684 Lu T-T, Shiou S-H (2002) Inverses of 2\(\times\) 2 block matrices. Comput Math Appl 43:119–129 Ma Y (2010) A semiparametric efficient estimator in case–control studies. Bernoulli 16:585–603 Modan B, Hartge P, Hirsh-Yechezkel G, Chetrit A, Lubin F, Beller U, Ben-Baruch G, Fishman A, Menczer J, Struewing JP et al (2001) Parity, oral contraceptives, and the risk of ovarian cancer among carriers and noncarriers of a BRCA1 or BRCA2 mutation. N Engl J Med 345:235–240 Mukherjee B, Chatterjee N (2008) Exploiting gene–environment independence for analysis of case–control studies: an empirical bayes-type shrinkage estimator to trade-off between bias and efficiency. Biometrics 64:685–694 Mullins N, Power RA, Fisher HL, Euesden J, Iniesta R, Craig IW, Farmer AE, McGuffin P, Breen G, Lewis CM et al (2016) Polygenic interactions with environmental adversity in the aetiology of major depressive disorder. Psychol Med 46:759–770 Nickels S (2013) Evidence of gene–environment interactions between common breast cancer susceptibility loci and established environmental risk factors. PLoS Genet 9:e1003284 Pfeiffer RM, Park Y, Kreimer AR, Lacey JV Jr, Pee D, Greenlee RT, Buys SS, Hollenbeck A, Rosner B, Gail MH et al (2013) Risk prediction for breast, endometrial, and ovarian cancer in white women aged 50 y or older: derivation and validation from population-based cohort studies. PLoS Med 10:e1001492 Piegorsch WW, Weinberg CR, Taylor JA (1994) Non-hierarchical logistic models and case-only designs for assessing susceptibility in population based case–control studies. Stat Med 13:153–162 Prentice RL, Pyke R (1979) Logistic disease incidence models and case–control studies. Biometrika 66:403–411 Rudolph A (2015) Investigation of gene–environment interactions between 47 newly identified breast cancer susceptibility loci and environmental risk factors. Int J Cancer 136:685–696 Stalder O, Asher A, Liang L, Carroll RJ, Ma Y, Chatterjee N (2017) Semiparametric analysis of complex polygenic gene–environment interactions in case–control studies. Biometrika 104:801–812