Predicting gene ontology annotations of orphan GWAS genes using protein-protein interactions

Springer Science and Business Media LLC - Tập 9 - Trang 1-13 - 2014
Usha Kuppuswamy1, Seshan Ananthasubramanian1,2, Yanli Wang1, Narayanaswamy Balakrishnan3, Madhavi K Ganapathiraju1,2
1Department of Biomedical Informatics and Intelligent Systems Program, University of Pittsburgh, Pittsburgh, USA
2Intelligent Systems Program, University of Pittsburgh, Pittsburgh, USA
3Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore, India

Tóm tắt

The number of genome-wide association studies (GWAS) has increased rapidly in the past couple of years, resulting in the identification of genes associated with different diseases. The next step in translating these findings into biomedically useful information is to find out the mechanism of the action of these genes. However, GWAS studies often implicate genes whose functions are currently unknown; for example, MYEOV, ANKLE1, TMEM45B and ORAOV1 are found to be associated with breast cancer, but their molecular function is unknown. We carried out Bayesian inference of Gene Ontology (GO) term annotations of genes by employing the directed acyclic graph structure of GO and the network of protein-protein interactions (PPIs). The approach is designed based on the fact that two proteins that interact biophysically would be in physical proximity of each other, would possess complementary molecular function, and play role in related biological processes. Predicted GO terms were ranked according to their relative association scores and the approach was evaluated quantitatively by plotting the precision versus recall values and F-scores (the harmonic mean of precision and recall) versus varying thresholds. Precisions of ~58% and ~ 40% for localization and functions respectively of proteins were determined at a threshold of ~30 (top 30 GO terms in the ranked list). Comparison with function prediction based on semantic similarity among nodes in an ontology and incorporation of those similarities in a k-nearest neighbor classifier confirmed that our results compared favorably. This approach was applied to predict the cellular component and molecular function GO terms of all human proteins that have interacting partners possessing at least one known GO annotation. The list of predictions is available at http://severus.dbmi.pitt.edu/engo/GOPRED.html . We present the algorithm, evaluations and the results of the computational predictions, especially for genes identified in GWAS studies to be associated with diseases, which are of translational interest.

Tài liệu tham khảo

Hirschhorn JN, Daly MJ: Genome-wide association studies for common diseases and complex traits. Nat Rev Genet. 2005, 6 (2): 95-108. McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JP, Hirschhorn JN: Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet. 2008, 9 (5): 356-369. Hirschhorn JN, Gajdos ZK: Genome-wide association studies: results from the first few years and potential implications for clinical medicine. Annu Rev Med. 2011, 62: 11-24. Hirschhorn JN: Genomewide association studies–illuminating biologic pathways. N Engl J Med. 2009, 360 (17): 1699-1701. GWAS Catalog.http://www.genome.gov/gwastudies/ Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, Henning AK, SanGiovanni JP, Mane SM, Mayne ST, Bracken MB, Ferris FL, Ott J, Barnstable C, Hoh J:Complement factor H polymorphism in age-related macular degeneration. Science. 2005, 308 (5720): 385-389. Genome-wide association study of 14, 000 cases of seven common diseases and 3, 000 shared controls. Nature. 2007, 447 (7145): 661-678. Lango Allen H, Estrada K, Lettre G, Berndt SI, Weedon MN, Rivadeneira F, Willer CJ, Jackson AU, Vedantam S, Raychaudhuri S, Ferreira T, Wood AR, Weyant RJ, Segrè AV, Speliotes EK, Wheeler E, Soranzo N, Park JH, Yang J, Gudbjartsson D, Heard-Costa NL, Randall JC, Qi L, Vernon Smith A, Mägi R, Pastinen T, Liang L, Heid IM, Luan J, Thorleifsson G:Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature. 2010, 467 (7317): 832-838. McCarthy MI, Hirschhorn JN: Genome-wide association studies: potential next steps on a genetic journey. Hum Mol Genet. 2008, 17 (R2): R156-R165. Hvidsen TR:Predicting function of genes and proteins from sequence, structure and expression data. Acta Universitatis Upsaliensis, Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology. 2004, 999: 63-ISBN 91-554-6014-3, Pandey G, Kumar V, Steinbach M: Computational Approaches for Protein Function Prediction: A Survey. 2006, Twin Cities: University of Minnesota, Zeng E, Ding C, Narasimhan G, Holbrook SR: Estimating support for protein-protein interaction data with applications to function prediction. Comput Syst Bioinformatics Conf. 2008, 7: 73-84. Ruepp A, Zollner A, Maier D, Albermann K, Hani J, Mokrejs M, Tetko I, Guldener U, Mannhaupt G, Munsterkotter M, Mewes HW:The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Res. 2004, 32 (18): 5539-5545. Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA:The COG database: an updated version includes eukaryotes. BMC Bioinforma. 2003, 4: 41-10.1186/1471-2105-4-41. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G:Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25 (1): 25-29. Pena-Castillo L, Tasan M, Myers CL, Lee H, Joshi T, Zhang C, Guan Y, Leone M, Pagnani A, Kim WK, Krumpelman C, Tian W, Obozinski G, Qi Y, Mostafavi S, Lin GN, Berriz GF, Gibbons FD, Lanckriet G, Qiu J, Grant C, Barutcuoglu Z, Hill DP, Warde-Farley D, Grouios C, Ray D, Blake JA, Deng M, Jordan MI, Noble WS:A critical assessment of Mus musculus gene function prediction using integrated genomic evidence. Genome Biol. 2008, 9 (Suppl 1): S2-10.1186/gb-2008-9-s1-s2. Hawkins T, Chitale M, Luban S, Kihara D: PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data. Proteins. 2009, 74 (3): 566-582. Edgar R, Domrachev M, Lash AE: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002, 30 (1): 207-210. Gaasterland T, Ragan MA: Constructing multigenome views of whole microbial genomes. Microb Comp Genomics. 1998, 3 (3): 177-192. Khan S, Situ G, Decker K, Schmidt CJ: GoFigure: automated Gene Ontology annotation. Bioinformatics. 2003, 19 (18): 2484-2485. Martin DM, Berriman M, Barton GJ: GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes. BMC Bioinforma. 2004, 5: 178-10.1186/1471-2105-5-178. Pandey G, Myers CL, Kumar V: Incorporating functional inter-relationships into protein function prediction algorithms. BMC Bioinforma. 2009, 10: 142-10.1186/1471-2105-10-142. Tao Y, Sam L, Li J, Friedman C, Lussier YA: Information theory applied to the sparse gene ontology annotation network to predict novel gene function. Bioinformatics. 2007, 23 (13): i529-i538. Tedder PM, Bradford JR, Needham CJ, McConkey GA, Bulpitt AJ, Westhead DR: Gene function prediction using semantic similarity clustering and enrichment analysis in the malaria parasite Plasmodium falciparum. Bioinformatics. 2010, 26 (19): 2431-2437. Sharan R, Ulitsky I, Shamir R: Network-based prediction of protein function. Mol Syst Biol. 2007, 3: 88- Deng M, Tu Z, Sun F, Chen T: Mapping gene ontology to proteins based on protein-protein interaction data. Bioinformatics. 2004, 20 (6): 895-902. Deng M, Chen T, Sun F: An integrated probabilistic model for functional prediction of proteins. J Comput Biol. 2004, 11 (2–3): 463-475. Nabieva E, Jim K, Agarwal A, Chazelle A, Singh M: Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics. 2005, 21: 302-310. 10.1093/bioinformatics/bti1054. Chua HN, Sung WK, Wong L: Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions. Bioinformatics. 2006, 22 (13): 1623-1630. King OD, Foulger RE, Dwight SS, White JV, Roth FP: Predicting gene function from patterns of annotation. Genome Res. 2003, 13 (5): 896-904. Brun C, Chevenet F, Martin D, Wojcik J, Guenoche A, Jacq B:Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network. Genome Biol. 2003, 5 (1): R6- Hishigaki H, Nakai K, Ono T, Tanigami A, Takagi T: Assessment of prediction accuracy of protein function from protein–protein interaction data. Yeast. 2001, 18 (6): 523-531. Mnaimneh S, Davierwala AP, Haynes J, Moffat J, Peng WT, Zhang W, Yang X, Pootoolal J, Chua G, Lopez A, Trochesset M, Morse D, Krogan NJ, Hiley SL, Li Z, Morris Q, Grigull J, Mitsakakis N, Roberts CJ, Greenblatt JF, Boone C, Kaiser CA, Andrews BJ, Hughes TR:Exploration of essential gene functions via titratable promoter alleles. Cell. 2004, 118 (1): 31-44. 10.1016/j.cell.2004.06.013. , :http://www.ncbi.nlm.nih.gov/gene/, Greengard P, Valtorta F, Czernik AJ, Benfenati F: Synaptic vesicle phosphoproteins and regulation of synaptic function. Science. 1993, 259 (5096): 780-785. Mahler RJ, Adler ML: Type 2 diabetes Mellitus: update on diagnosis, pathophysiology, and treatment. J Clin Endocrinol Metab. 1999, 84 (4): 1165-1171. 10.1210/jcem.84.4.5612. Pedersen MG, Corradin A, Toffolo GM, Cobelli C: A subcellular model of glucose-stimulated pancreatic insulin secretion. Phil Trans R Soc A. 2008, 366: 3525-3543. Daniel S, Noda M, Straub SG, Sharp GW: Identification of the docked granule pool responsible for the first phase of glucose-stimulated insulin secretion. Diabetes. 1999, 48 (9): 1686-1690. Sudhof TC, Czernik AJ, Kao HT, Takei K, Johnston PA, Horiuchi A, Kanazir SD, Wagner MA, Perin MS, Camilli PD, Greengard P:Synapsins: mosaics of shared and individual domains in a family of synaptic vesicle phosphoproteins. Science. 1989, 245 (4925): 1474-1480. Lee HJ, Song JY, Kim JW, Jin S, Hong MS, Park JK, Chung J, Shibata H, Fukumaki Y: Association study of polymorphisms in synaptic vesicle-associated genes, SYN2 and CPLX2, with schizophrenia. Behav Brain Funct. 2005,1-15. Faraco J, Lin X, Li R, Hinton L, Mignot E: Genetic studies in narcolepsy, a disorder affecting REM sleep. J Hered. 1999, 90 (1): 129-132. Fontana A, Gast H, Reith W, Recher M, Birchler T, Bassetti CL: Narcolepsy: autoimmunity, effector T cell activation due to infection, or T cell independent, major histocompatibility complex class II induced neuronal loss?. Brain. 2010, 133: 1300-1311. Mignot E: A commentary on the neurobiology of the hypocretin/orexin system. Neuropsychopharmacology. 2001, 5 Suppl: S5-S13. Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P: Molecular Biology of the Cell. 2002, New York: Garland Science,2002.