Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics
Tóm tắt
Scalable, integrative methods to understand mechanisms that link genetic variants with phenotypes are needed. Here we derive a mathematical expression to compute PrediXcan (a gene mapping approach) results using summary data (S-PrediXcan) and show its accuracy and general robustness to misspecified reference sets. We apply this framework to 44 GTEx tissues and 100+ phenotypes from GWAS and meta-analysis studies, creating a growing public catalog of associations that seeks to capture the effects of gene expression variation on human phenotypes. Replication in an independent cohort is shown. Most of the associations are tissue specific, suggesting context specificity of the trait etiology. Colocalized significant associations in unexpected tissues underscore the need for an agnostic scanning of multiple contexts to improve our ability to detect causal regulatory mechanisms. Monogenic disease genes are enriched among significant associations for related traits, suggesting that smaller alterations of these genes may cause a spectrum of milder phenotypes.
Từ khóa
Tài liệu tham khảo
Nica, A. C. et al. Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations. PLOS Genet. 6, 1000895 (2010).
Nicolae, D. L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLOS Genet. 6, e1000888, (2010).
Li, Y. I. et al. RNA splicing is a primary link between genetic variation and disease. Science 352, 600–604 (2016).
Gusev, A. et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535–552 (2014).
Battle, A. et al. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res. 24, 14–24 (2014).
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).
Zhang, X. et al. Identification of common genetic variants controlling transcript isoform variation in human whole blood. Nat. Genet. 47, 345–352 (2015).
Stranger, B. E. et al. Patterns of Cis regulatory variation in diverse human populations. PLOS Genet. 8, e1002639 (2012).
Aguet F., et al. Local genetic effects on gene expression across 44 human tissues. Preprint at bioRxiv: http://biorxiv.org/content/early/2016/09/09/074450 (2016).
Gamazon, E. R. et al. A genebased association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).
Smoller, J. W. et al. Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. Lancet 381, 1371–9 (2013).
Deloukas, P. et al. Large scale association analysis identifies new risk loci for coronary artery disease. Nat. Genet. 45, 25–33 (2013).
Morris, A. P. et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat. Genet. 44, 981–990 (2012).
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).
Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–7 (2016).
He, X. et al. Sherlock: detecting gene-disease associations by matching patterns of expression QTL and GWAS. Am. J. Hum. Genet. 92, 667–680 (2013).
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLOS Genet. 10, e1004383 (2014).
Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).
Wen, X., Pique-Regi, R. & Luca, F. Integrating molecular QTL data into genome-wide genetic association analysis: Probabilistic assessment of enrichment and colocalization. PLOS Genet. 13, e1006646 (2017).
WTCCC. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–78 (2007).
Manor, O. & Segal, E. Robust prediction of expression differences among human individuals using only genotype information. PLOS Genet. 9, e1003396 (2013).
Hamilton N. ggtern: an extension to’ggplot2’, for the creation of ternary diagrams https://CRAN.R-project.org/package=ggtern (R package version 2.2.0, 2016).
Mancuso, N. et al. Integrating gene expression with summary association statistics to identify genes associated with 30 complex traits. Am. J. Hum. Genet. 100, 473–487 (2017).
Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with bayesian sparse linear mixed models. PLOS Genet. 9, e1003264 (2013).
Zou, H. & Hastie, T. Regularization and variable selection via the elastic-net. J. R. Stat. Soc. 67, 301–320 (2005).
Wheeler, H. E. et al. Survey of the heritability and sparse architecture of gene expression traits across human tissues. PLOS Genet. 12, e1006423 (2016).
Pavlides, J. M. W. et al. Predicting gene targets from integrative analyses of summary data from GWAS and eQTL studies for 28 human complex traits. Genome Med. 8, 1–6 (2016).
Westra, H. J. et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 45, 1238–1243 (2013).
Landrum, M. J. et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic acids Res. 44, D862–8 (2015).
Shah N., et al. Identification of misclassified ClinVar variants using disease population prevalence. Preprint at biorxiv. http://biorxiv.org/lookup/doi/10.1101/075416 (2016).
Sekar, A. et al. Schizophrenia risk from complex variation of complement component 4. Nature 530, 177–83 (2016).
Musunuru, K. et al. From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature 466, 714–9 (2010).
Dadu, R. T. & Ballantyne, C. M. Lipid lowering with PCSK9 inhibitors. Nat. Publ. Group. 11, 563–575 (2014).
Franzén, O. et al. Cardiometabolic risk loci share downstream cis- and trans-gene regulation across tissues and diseases. Science 353, 827–830 (2016).
Hoffmann, T. J. et al. Genome-wide association analyses using electronic health records identify new loci influencing blood pressure variation. Nat. Genet. 49, 54–64 (2016).
Cook, J. P. & Morris, A. P. Multi-ethnic genome-wide association study identifies novel locus for type 2 diabetes susceptibility. Eur. J. Hum. Genet. 24, 1175–1180 (2016).
Torres J. M., et al. Integrative cross tissue analysis of gene expression identifies novel type 2 diabetes genes. Preprint at bioRxiv http://biorxiv.org/content/early/2017/02/27/108134 (2017).
Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).
Castel S. E., et al. Modified penetrance of coding variants by cis-regulatory variation shapes human traits. Preprint at bioRxiv. https://www.biorxiv.org/content/early/2017/09/18/190397 (2017).
Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).
Loh, P. R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).
McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).
Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).