Prioritizing causal disease genes using unbiased genomic features

Genome Biology - Tập 15 - Trang 1-19 - 2014
Rahul C Deo1,2,3,4,5, Gabriel Musso5,6, Murat Tasan5,7, Paul Tang3, Annie Poon3, Christiana Yuan8, Janine F Felix9, Ramachandran S Vasan10,11, Rameen Beroukhim6,12, Teresa De Marco2, Pui-Yan Kwok1,3, Calum A MacRae5, Frederick P Roth5,7,13,14
1Cardiovascular Research Institute, University of California, San Francisco, USA
2Department of Medicine, University of California, San Francisco, USA
3Institute for Human Genetics, University of California, San Francisco, USA
4California Institute for Quantitative Biosciences, San Francisco, USA
5Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, USA
6Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, USA
7Donnelly Centre and Departments of Molecular Genetics and Computer Science, University of Toronto and Lunenfeld Research Institute, Mt Sinai Hospital, Toronto, Canada
8Cardiovascular Research Institute, University of California, San Francisco, USA.
9Department of Epidemiology, Erasmus University Medical Center, Rotterdam, the Netherlands
10Preventive Medicine and Cardiology Sections, and Department of Medicine, Boston University School of Medicine, Boston, USA
11Framingham Heart Study, Boston University School of Medicine, Framingham, USA
12Center for Cancer Genome Discovery and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, USA
13Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, USA
14The Canadian Institute for Advanced Research, Toronto, Canada

Tóm tắt

Cardiovascular disease (CVD) is the leading cause of death in the developed world. Human genetic studies, including genome-wide sequencing and SNP-array approaches, promise to reveal disease genes and mechanisms representing new therapeutic targets. In practice, however, identification of the actual genes contributing to disease pathogenesis has lagged behind identification of associated loci, thus limiting the clinical benefits. To aid in localizing causal genes, we develop a machine learning approach, Objective Prioritization for Enhanced Novelty (OPEN), which quantitatively prioritizes gene-disease associations based on a diverse group of genomic features. This approach uses only unbiased predictive features and thus is not hampered by a preference towards previously well-characterized genes. We demonstrate success in identifying genetic determinants for CVD-related traits, including cholesterol levels, blood pressure, and conduction system and cardiomyopathy phenotypes. Using OPEN, we prioritize genes, including FLNC, for association with increased left ventricular diameter, which is a defining feature of a prevalent cardiovascular disorder, dilated cardiomyopathy or DCM. Using a zebrafish model, we experimentally validate FLNC and identify a novel FLNC splice-site mutation in a patient with severe DCM. Our approach stands to assist interpretation of large-scale genetic studies without compromising their fundamentally unbiased nature.

Tài liệu tham khảo

Rossin EJ, Lage K, Raychaudhuri S, Xavier RJ, Tatar D, Benita Y, Cotsapas C, Daly MJ, International Inflammatory Bowel Disease Genetics Constortium: Proteins encoded in genomic regions associated with immune-mediated disease physically interact and suggest underlying biology. PLoS Genet. 2011, 7: e1001273-10.1371/journal.pgen.1001273.

Raychaudhuri S, Plenge RM, Rossin EJ, Ng ACY, Purcell SM, Sklar P, Scolnick EM, Xavier RJ, Altshuler D, Daly MJ, International Schizophrenia Consortium: Identifying relationships among genomic disease regions: predicting genes at pathogenic SNP associations and rare deletions. PLoS Genet. 2009, 5: e1000534-10.1371/journal.pgen.1000534.

Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning. 2009, Springer, New York, NY, USA, 2

Töscher A: Jahrer M. 2009, The BigChaos Solution to the Netflix Grand Prize, Bell RM

Seidman JG, Seidman C: The genetic basis for cardiomyopathy: review from mutation identificationto mechanistic paradigms. Cell. 2001, 104: 557-567. 10.1016/S0092-8674(01)00242-2.

Tennessen JA, Bigham AW, O'Connor TD, Fu W, Kenny EE, Gravel S, McGee S, Do R, Liu X, Jun G, Kang HM, Jordan D, Leal SM, Gabriel S, Rieder MJ, Abecasis G, Altshuler D, Nickerson DA, Boerwinkle E, Sunyaev S, Bustamante CD, Bamshad MJ, Akey JM, Broad GO, Seattle GO, on behalf of the NHLBI Exome Sequencing Project: Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 2012, 337:64–69. http://evs.gs.washington.edu/EVS/.,

Vasan RS, Larson MG, Benjamin EJ: Left ventricular dilatation and the risk of congestive heart failure in people without myocardial infarction. N Engl J Med. 1997, 336: 1350-1355. 10.1056/NEJM199705083361903.

Ding Y, Sun X, Huang W, Hoage T, Redfield M, Kushwaha S, Sivasubbu S, Lin X, Ekker S, Xu X: Haploinsufficiency of target of rapamycin attenuates cardiomyopathies in adult zebrafish. Circ Res. 2011, 109: 658-669. 10.1161/CIRCRESAHA.111.248260.

Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Yang JYH, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 2004, 5:R80. http://www.bioconductor.org.,

Smyth GK: Limma: linear models for microarray data. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Edited by: Gentleman R, Carey V, Dudoit S, Irizarry R, Huber W. 2005, Springer, New York, 397-420. 10.1007/0-387-29362-0_23.

The R Project for Statistical Computing [http://www.r-project.org]

International HapMap Project [http://hapmap.ncbi.nlm.nih.gov/downloads/ld_data/latest/]

Gene Tools Oligo Design Website [http://www.gene-tools.com/Oligo_Design]

Westerfield M: The Zebrafish Book: A Guide for the Laboratory Use of Zebrafish (Brachydanio rerio). 1993, University of Oregon Press, Eugene, OR, USA

ImageJ [http://rsbweb.nih.gov/ij/]

1000 Genomes Project [ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521/]

OPEN source code [https://www.dropbox.com/s/3ufe2k1tsurqtux/open.tar.gz?dl=0]

OPEN predictions for GWA loci are available at [http://cvri.ucsf.edu/~deo/disease_mapping.html]