Comparison of weighting approaches for genetic risk scores in gene-environment interaction studies

BMC Genetics - Tập 18 - Trang 1-12 - 2017
Anke Hüls1,2, Ursula Krämer3, Christopher Carlsten4,5,6, Tamara Schikowski7, Katja Ickstadt2, Holger Schwender8
1IUF - Leibniz Research Institute for Environmental Medicine, Düsseldorf, Germany
2Faculty of Statistics, TU Dortmund University, Dortmund, Germany
3IUF – Leibniz Research Institute for Environmental Medicine, Düsseldorf, Germany
4Department of Medicine, University of British Columbia, Vancouver, Canada
5Institute for Heart and Lung Health, Vancouver, Canada
6School of Population and Public Health, University of British Columbia, Vancouver, Canada
7IUF-Leibniz Research Institute for Environmental Medicine, Düsseldorf, Germany
8Mathematical Institute, Heinrich Heine University, Düsseldorf, Germany

Tóm tắt

Weighted genetic risk scores (GRS), defined as weighted sums of risk alleles of single nucleotide polymorphisms (SNPs), are statistically powerful for detection gene-environment (GxE) interactions. To assign weights, the gold standard is to use external weights from an independent study. However, appropriate external weights are not always available. In such situations and in the presence of predominant marginal genetic effects, we have shown in a previous study that GRS with internal weights from marginal genetic effects (“GRS-marginal-internal”) are a powerful and reliable alternative to single SNP approaches or the use of unweighted GRS. However, this approach might not be appropriate for detecting predominant interactions, i.e. interactions showing an effect stronger than the marginal genetic effect. In this paper, we present a weighting approach for such predominant interactions (“GRS-interaction-training”) in which parts of the data are used to estimate the weights from the interaction terms and the remaining data are used to determine the GRS. We conducted a simulation study for the detection of GxE interactions in which we evaluated power, type I error and sign-misspecification. We compared this new weighting approach to the GRS-marginal-internal approach and to GRS with external weights. Our simulation study showed that in the absence of external weights and with predominant interaction effects, the highest power was reached with the GRS-interaction-training approach. If marginal genetic effects were predominant, the GRS-marginal-internal approach was more appropriate. Furthermore, the power to detect interactions reached by the GRS-interaction-training approach was only slightly lower than the power achieved by GRS with external weights. The power of the GRS-interaction-training approach was confirmed in a real data application to the Traffic, Asthma and Genetics (TAG) Study (N = 4465 observations). When appropriate external weights are unavailable, we recommend to use internal weights from the study population itself to construct weighted GRS for GxE interaction studies. If the SNPs were chosen because a strong marginal genetic effect was hypothesized, GRS-marginal-internal should be used. If the SNPs were chosen because of their collective impact on the biological mechanisms mediating the environmental effect (hypothesis of predominant interactions) GRS-interaction-training should be applied.

Tài liệu tham khảo

Ottman R. Gene–environment Interaction : definitions and study designs. Prev Med (Baltim). 1996;25:764–70. Dudbridge F. Polygenic epidemiology. Genet Epidemiol. 2016;40:268–72. Dudbridge F. Power and predictive accuracy of polygenic risk scores. PLoS Genet. 2013;9:e1003348. Purcell SM, Wray NR, Stone JL, Visscher PM, O’Donovan MC, Sullivan PF, et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;72:1343–54. Hamshere ML, O’Donovan MC, Jones IR, Jones L, Kirov G, Green EK, et al. Polygenic dissection of the bipolar phenotype. Br J Psychiatry. 2011;198:284–8. Eze IC, Imboden M, Kumar A, von Eckardstein A, Stolz D, Gerbase MW, et al. Air pollution and diabetes association: modification by type 2 diabetes genetic risk score. Environ Int The Authors. 2016;94:263–71. Hüls A, Krämer U, Herder C, Fehsel K, Luckhaus C, Stolz S, et al. Genetic susceptibility for air pollution-induced airway inflammation in the SALIA study. Environ Res Elsevier. 2017;152:43–50. Qi Q, Chu AY, Kang JH, Huang J, Rose LM, Jensen MK, et al. Fried food consumption, genetic risk, and body mass index: gene-diet interaction analysis in three US cohort studies. BMJ. 2014;348:g1610. Aschard HA. Perspective on interaction effects in genetic association studies. Genet Epidemiol. 2016;40:678–88. Kraft P, Yen YC, Stram DO, Morrison J, Gauderman WJ. Exploiting gene-environment interaction to detect genetic associations. Hum Hered. 2007;63:111–9. Che R, Motsinger-Reif A. a. Evaluation of genetic risk score models in the presence of interaction and linkage disequilibrium. Front Genet. 2013;4:1–10. Hüls A, Ickstadt K, Schikowski T, Krämer U. Detection of gene-environment interactions in the presence of linkage disequilibrium and noise by using genetic risk scores with internal weights from elastic net regression. BMC Genet. 2017;18:55. Zou H, Hastie T. Regularization and variable selection via the elastic-net. J R Stat Soc. 2005;67:301–20. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2009;33:1–22. Waldmann P, Mészáros G, Gredler B, Fuerst C, Sölkner J. Evaluation of the lasso and the elastic net in genome-wide association studies. Front Genet. 2013;4:1–11. Simon N, Friedman J, Hastie T, Tibshirani R. Regularization paths for Cox’s proportional hazards model via coordinate descent. J Stat Softw. 2011;39:1–13. Burgess S, Dudbridge F, Thompson SG. Combining information on multiple instrumental variables in Mendelian randomization: comparison of allele score and summarized data methods. Stat Med. 2016;35:1880–906. Burgess S, Thompson SG. Use of allele scores as instrumental variables for Mendelian randomization. Int J Epidemiol. 2013;42:1134–44. McCullagh P, Nelder JA. Generalized linear models. 2nd ed. London: Chapman and Hall; 1989. Nelder JA, Wedderburn RWM. Generalized linear models. J R Stat Soc A. 1972;135:370–84. Schwender H, Fritsch A. scrime: Analysis of High-Dimensional Categorical Data such as SNP Data. R package version 1.3.3. 2013. Development Core R, Team R. A language and environment for statistical computing [internet]. Vienna, Austria: R foundation for statistical. Computing. 2017; Available from: http://www.r-project.org/ MacIntyre EA, Brauer M, Melén E, Bauer CP, Bauer M, Berdel D, et al. GSTP1 and TNF gene variants and associations between air pollution and incident childhood asthma: the traffic, asthma and genetics (TAG) study. Environ Health Perspect. 2014;122:418–24. MacIntyre EA, Carlsten C, MacNutt M, Fuertes E, Melén E, Tiesler CMT, et al. Traffic, asthma and genetics: combining international birth cohort data to examine genetics as a mediator of traffic-related air pollution’s impact on childhood asthma. Eur J Epidemiol. 2013;28:597–606. Fuertes E, Brauer M, MacIntyre E, Bauer M, Bellander T, Von Berg A, et al. Childhood allergic rhinitis, traffic-related air pollution, and variability in the GSTP1, TNF, TLR2, and TLR4 genes: results from the TAG study. J Allergy Clin Immunol. 2013;132:342–52. Lee M, Hong Y, Kim W, London S. Epigenome-wide association study of chronic obstructive pulmonary disease and lung function in Koreans. Epigenomics. 2017;9:971–84. Kelly FJ. Oxidative stress: its role in air pollution and adverse health effects. Occup Environ Med. 2003;60:612–6. Lockhart R, Taylor J, Tibshirani RJ, Tibshirani RA. Significance test for the lasso. Ann Stat. 2014;42:413–68.