An effector index to predict target genes at GWAS loci

Springer Science and Business Media LLC - Tập 141 - Trang 1431-1447 - 2022
Vincenzo Forgetta1,2, Lai Jiang1,3, Nicholas A. Vulpescu4, Megan S. Hogan4, Siyuan Chen1,3, John A. Morris1,5,6,7, Stepan Grinek4, Christian Benner8, Dong-Keun Jang9, Quy Hoang9, Noel Burtt9, Jason A. Flannick9,10,11, Mark I. McCarthy12, Eric Fauman13, Celia M. T. Greenwood1,3,14,7, Matthew T. Maurano4, J. Brent Richards1,3,7,15,16,2
1Centre for Clinical Epidemiology, Lady Davis Institute for Medical Research, Pavillon H-413, Jewish General Hospital, Montreal, Canada
25 Prime Sciences Incorporated, Montreal, Canada
3Departments of Medicine, Epidemiology and Biostatistics, McGill University, Montreal, Canada
4Institute for Systems Genetics and Department of Pathology, NYU School of Medicine, New York, USA
5New York Genome Center, New York, USA
6Department of Biology, New York University, New York, USA
7Department of Human Genetics, McGill University, Montréal, Canada
8Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
9Program in Medical and Population Genetics, Metabolism Program, Broad Institute of Harvard and MIT, Cambridge, USA
10Department of Pediatrics, Harvard Medical School, Boston, USA
11Division of Genetics and Genomics, Boston Children’s Hospital, Boston, USA
12Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
13Internal Medicine Research Unit, Pfizer Worldwide Research, Development and Medical, New York, USA
14Gerald Bronfman Department of Oncology, McGill University, Montréal, Canada
15Department of Medicine, McGill University, Montréal, Canada
16Department of Twin Research, King’s College London, London, UK

Tóm tắt

Drug development and biological discovery require effective strategies to map existing genetic associations to causal genes. To approach this problem, we selected 12 common diseases and quantitative traits for which highly powered genome-wide association studies (GWAS) were available. For each disease or trait, we systematically curated positive control gene sets from Mendelian forms of the disease and from targets of medicines used for disease treatment. We found that these positive control genes were highly enriched in proximity of GWAS-associated single-nucleotide variants (SNVs). We then performed quantitative assessment of the contribution of commonly used genomic features, including open chromatin maps, expression quantitative trait loci (eQTL), and chromatin conformation data. Using these features, we trained and validated an Effector Index (Ei), to map target genes for these 12 common diseases and traits. Ei demonstrated high predictive performance, both with cross-validation on the training set, and an independently derived set for type 2 diabetes. Key predictive features included coding or transcript-altering SNVs, distance to gene, and open chromatin-based metrics. This work outlines a simple, understandable approach to prioritize genes at GWAS loci for functional follow-up and drug development, and provides a systematic strategy for prioritization of GWAS target genes.

Tài liệu tham khảo

Aguet F, Ardlie KG, Cummings BB et al (2017) Genetic effects on gene expression across human tissues. Nature 550:204–213. https://doi.org/10.1038/nature24277

Arrowsmith J (2011a) Trial watch: phase III and submission failures: 2007–2010. Nat Rev Drug Discov 10:87

Arrowsmith J (2011b) Trial watch: phase II failures: 2008–2010. Nat Rev Drug Discov 10:328–329

Arrowsmith J, Miller P (2013) Trial watch: phase II and phase III attrition rates 2011–2012. Nat Rev Drug Discov 12:569

Ayellet VS, Groop L, Mootha VK et al (2010) Common inherited variation in mitochondrial genes is not enriched for associations with type 2 diabetes or related glycemic traits. PLoS Genet 6:1001058. https://doi.org/10.1371/journal.pgen.1001058

Benner C, Spencer CCA, Havulinna AS et al (2016) FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32:1493–1501. https://doi.org/10.1093/bioinformatics/btw018

Benner C, Havulinna AS, Järvelin MR et al (2017) Prospects of fine-mapping trait-associated genomic regions by using summary statistics from genome-wide association studies. Am J Hum Genet 101:539–551. https://doi.org/10.1016/j.ajhg.2017.08.012

Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. https://doi.org/10.1093/bioinformatics/btu170

Bycroft C, Freeman C, Petkova D et al (2018) The UK Biobank resource with deep phenotyping and genomic data. Nature 562:203–209. https://doi.org/10.1038/s41586-018-0579-z

Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, pp 785–794

Claussnitzer M, Dankel SN, Kim KH et al (2015) FTO obesity variant circuitry and adipocyte browning in humans. N Engl J Med. https://doi.org/10.1056/NEJMoa1502214

Djebali S, Davis CA, Merkel A et al (2012) Landscape of transcription in human cells. Nature. https://doi.org/10.1038/nature11233

Flannick J, Mercader JM, Fuchsberger C et al (2019) Exome sequencing of 20,791 cases of type 2 diabetes and 24,440 controls. Nature. https://doi.org/10.1038/s41586-019-1231-2

Greenwald WW, Chiou J, Yan J et al (2019) Pancreatic islet chromatin accessibility and conformation reveals distal enhancer networks of type 2 diabetes risk. Nat Commun. https://doi.org/10.1038/s41467-019-09975-4

Hormozdiari F, van de Bunt M, Segrè AV et al (2016) Colocalization of GWAS and eQTL signals detects target genes. Am J Hum Genet 99:1245–1260. https://doi.org/10.1016/j.ajhg.2016.10.003

Jiang L, Zheng Z, Qi T et al (2019) A resource-efficient tool for mixed model association analysis of large-scale data. Nat Genet 51:1749–1755. https://doi.org/10.1038/s41588-019-0530-8

John S, Sabo PJ, Canfield TK et al (2013) Genome-scale mapping of DNase I hypersensitivity. Curr Protoc Mol Biol. https://doi.org/10.1002/0471142727.mb2127s103

Johnson VE (2013) Revised standards for statistical evidence. Proc Natl Acad Sci USA. https://doi.org/10.1073/pnas.1313476110

Jones P, Kafonek S, Laurora I, Hunninghake D (1998) Comparative dose efficacy study of atorvastatin versus simvastatin, pravastatin, lovastatin, and fluvastatin in patients with hypercholesterolemia (the CURVES study). Am J Cardiol. https://doi.org/10.1016/S0002-9149(97)00965-X

Jung I, Schmitt A, Diao Y et al (2019) A compendium of promoter-centered long-range chromatin interactions in the human genome. Nat Genet. https://doi.org/10.1038/s41588-019-0494-8

Kerch A, Simes R, Barter P, Best J, Scott R (2005) Taskinen MR et al., FIELD Study Investigators. Effects of long-term fenofibrate therapy on cardiovascular events in 9795 people with type 2 diabetes mellitus (the FIELD study): randomised controlled trial. Lancet. https://doi.org/10.1016/S0140-6736(05)67667-2

Kichaev G, Yang WY, Lindstrom S et al (2014) Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet 10:1004722. https://doi.org/10.1371/journal.pgen.1004722

King EA, Wade Davis J, Degner JF (2019) Are drug targets with genetic support twice as likely to be approved? Revised estimates of the impact of genetic support for drug mechanisms on the probability of drug approval. PLoS Genet 15:e1008489. https://doi.org/10.1371/journal.pgen.1008489

Law MR, Wald NJ, Rudnicka AR (2003) Quantifying effect of statins on low density lipoprotein cholesterol, ischaemic heart disease, and stroke: systematic review and meta-analysis. Br Med J. https://doi.org/10.1136/bmj.326.7404.1423

Lawlor N, George J, Bolisetty M et al (2017) Single-cell transcriptomes identify human islet cell signatures and reveal cell-type-specific expression changes in type 2 diabetes. Genome Res. https://doi.org/10.1101/gr.212720.116

Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754–1760. https://doi.org/10.1093/bioinformatics/btp324

Mahajan A, Taliun D, Thurner M et al (2018a) Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat Genet 50:1505–1513. https://doi.org/10.1038/s41588-018-0241-6

Mahajan A, Wessel J, Willems SM et al (2018b) Refining the accuracy of validated target identification through coding variant fine-mapping in type 2 diabetes article. Nat Genet 50:559–571. https://doi.org/10.1038/s41588-018-0084-1

Mahajan A, McCarthy MI (2019) Predicted type 2 diabetes effector genes. https://s3.amazonaws.com/broad-portal-resources/effector_predictions_documentation.pdf

Maurano MT, Humbert R, Rynes E et al (2012) Systematic localization of common disease-associated variation in regulatory DNA. Science (80-). https://doi.org/10.1126/science.1222794

Maurano MT, Haugen E, Sandstrom R et al (2015) Large-scale identification of sequence variants influencing human transcription factor occupancy in vivo. Nat Genet 47:1393–1401. https://doi.org/10.1038/ng.3432

Miguel-Escalada I, Bonàs-Guarch S, Cebola I et al (2019) Human pancreatic islet three-dimensional chromatin architecture provides insights into the genetics of type 2 diabetes. Nat Genet. https://doi.org/10.1038/s41588-019-0457-0

Morris JA, Kemp JP, Youlten SE et al (2019) An atlas of genetic influences on osteoporosis in humans and mice. Nat Genet. https://doi.org/10.1038/s41588-018-0302-x

Nelson MR, Tipney H, Painter JL et al (2015) The support of human genetic evidence for approved drug indications. Nat Genet 47:856–860. https://doi.org/10.1038/ng.3314

O’Seaghdha CM, Wu H, Yang Q et al (2013) Meta-analysis of genome-wide association studies identifies six new loci for serum calcium concentrations. PLoS Genet. https://doi.org/10.1371/journal.pgen.1003796

Pan DZ, Garske KM, Alvarez M et al (2018) Integration of human adipocyte chromosomal interactions with adipose gene expression prioritizes obesity-related genes from GWAS. Nat Commun. https://doi.org/10.1038/s41467-018-03554-9

Pandor A, Ara RM, Tumur I et al (2009) Ezetimibe monotherapy for cholesterol lowering in 2722 people: systematic review and meta-analysis of randomized controlled trials. J Intern Med. https://doi.org/10.1111/j.1365-2796.2008.02062.x

Parker SCJ, Stitzel ML, Taylor DL et al (2013) Chromatin stretch enhancer states drive cell-specific gene regulation and harbor human disease risk variants. Proc Natl Acad Sci USA. https://doi.org/10.1073/pnas.1317023110

Pers TH, Karjalainen JM, Chan Y et al (2015a) Biological interpretation of genome-wide association studies using predicted gene functions. Nat Commun 6:5890. https://doi.org/10.1038/ncomms6890

Pers TH, Karjalainen JM, Chan Y et al (2015b) Biological interpretation of genome-wide association studies using predicted gene functions. Nat Commun 6:1–9. https://doi.org/10.1038/ncomms6890

Schriml LM, Mitraka E, Munro J et al (2019) Human Disease Ontology 2018 update: classification, content and workflow expansion. Nucleic Acids Res 47:D955–D962. https://doi.org/10.1093/nar/gky1032

Smemo S, Tena JJ, Kim KH et al (2014) Obesity-associated variants within FTO form long-range functional connections with IRX3. Nature. https://doi.org/10.1038/nature13138

Stacey D, Fauman EB, Ziemek D et al (2019) ProGeM: A framework for the prioritization of candidate causal genes at molecular quantitative trait loci. Nucleic Acids Res. https://doi.org/10.1093/nar/gky837

Thurman RE, Rynes E, Humbert R et al (2012a) The accessible chromatin landscape of the human genome. Nature. https://doi.org/10.1038/nature11232

Thurman RE, Rynes E, Humbert R et al (2012b) The accessible chromatin landscape of the human genome. Nature 489:75–82. https://doi.org/10.1038/nature11232

Wishart DS, Feunang YD, Guo AC et al (2018) DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res 46:D1074–D1082. https://doi.org/10.1093/nar/gkx1037

Yao DW, O’Connor LJ, Price AL, Gusev A (2020) Quantifying genetic effects on disease mediated by assayed gene expression levels. Nat Genet 52:626–633. https://doi.org/10.1038/s41588-020-0625-2

Zhu X, Stephens M (2018) Large-scale genome-wide enrichment analyses identify new trait-associated genes and pathways across 31 human phenotypes. Nat Commun. https://doi.org/10.1038/s41467-018-06805-x