An integrative approach for measuring semantic similarities using gene ontology

BMC Systems Biology - Tập 8 - Trang 1-12 - 2014
Jiajie Peng1,2, Hongxiang Li1, Qinghua Jiang3, Yadong Wang1, Jin Chen2
1School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
2MSU-DOE Plant Research Laboratory, Michigan State University, East Lansing, USA
3School of Life Science and Technology, Harbin Institute of Technology, Harbin, China

Tóm tắt

Gene Ontology (GO) provides rich information and a convenient way to study gene functional similarity, which has been successfully used in various applications. However, the existing GO based similarity measurements have limited functions for only a subset of GO information is considered in each measure. An appropriate integration of the existing measures to take into account more information in GO is demanding. We propose a novel integrative measure called InteGO 2 to automatically select appropriate seed measures and then to integrate them using a metaheuristic search method. The experiment results show that InteGO 2 significantly improves the performance of gene similarity in human, Arabidopsis and yeast on both molecular function and biological process GO categories. InteGO 2 computes gene-to-gene similarities more accurately than tested existing measures and has high robustness. The supplementary document and software are available at http://mlg.hit.edu.cn:8082/ .

Tài liệu tham khảo

Consortium GO: Gene Ontology annotations and resources. Nucleic acids research. 2013, 41: D530-D535. Blake J: Ten quick tips for using the gene ontology. PLoS computational biology. 2013, 9: e1003343-10.1371/journal.pcbi.1003343. Vafaee F, Rosu D, Broackes-Carter F, Jurisica I: Novel semantic similarity measure improves an integrative approach to predicting gene functional associations. BMC systems biology. 2013, 7: 22-10.1186/1752-0509-7-22. Nehrt N, Clark W, Radivojac P, Hahn M: Testing the ortholog conjecture with comparative functional genomic data from mammals. PLoS computational biology. 2011, 7: e1002073-10.1371/journal.pcbi.1002073. Lewis B, Shih I, Jones-Rhoades M, Bartel D, Burge C: Prediction of mammalian microRNA targets. Cell. 2003, 115: 787-798. 10.1016/S0092-8674(03)01018-3. Lu Z, Hunter L: GO molecular function terms are predictive of subcellular localization. PSB. 151- Lord P, Stevens R, Brass A, Goble C: Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics. 2003, 19: 1275-1283. 10.1093/bioinformatics/btg153. Cheng J, Cline M, Martin J, Finkelstein D, Awad T, Kulp D, Siani-Rose M: A knowledge-based clustering algorithm driven by gene ontology. Journal of biopharmaceutical statistics. 2004, 14: 687-700. 10.1081/BIP-200025659. Couto F, Silva M, Coutinho P: Semantic similarity over the gene ontology: family correlation and selecting disjunctive ancestors. CIKM. 2005, 343-344. Bodenreider O, Aubry M, Burgun A: Non-lexical approaches to identifying associative relations in the gene ontology. PSB. 2005, 91- Wu H, Su Z, Mao F, Olman V, Xu Y: Prediction of functional modules based on comparative genome analysis and Gene Ontology application. Nucleic acids research. 2005, 33: 2822-2837. 10.1093/nar/gki573. Yu H, Gao L, Tu K, Guo Z: Broadly predicting specific gene functions with expression similarity and taxonomy similarity. Gene. 2005, 352: 75-81. Schlicker A, Domingues F, Rahnenfhrer J, Lengauer T: A new measure for functional similarity of gene products based on Gene Ontology. BMC bioinformatics. 2006, 7: 302-10.1186/1471-2105-7-302. Riensche R, Baddeley B, Sanfilippo A, Posse C, Gopalan B: Xoa: Web-enabled cross-ontological analytics. IEEE Congress on Services. 2007, 99-105. Wang J, Du Z, Payattakool R, Philip S, Chen C: A new method to measure the semantic similarity of GO terms. Bioinformatics. 2007, 23: 1274-1281. 10.1093/bioinformatics/btm087. Yu H, Jansen R, Stolovitzky G, Gerstein M: Total ancestry measure: quantifying the similarity in tree- like classification, with genomic applications. Bioinformatics. 2007, 23: 2163-2173. 10.1093/bioinformatics/btm291. del Pozo A, Pazos F, Valencia A: Defining functional distances over Gene Ontology. BMC bioinformatics. 2008, 9: 50-10.1186/1471-2105-9-50. Pesquita C, Faria D, Falcao A, Lord P, Couto F: Semantic similarity in biomedical ontologies. PLoS computational biology. 2009, 5: e1000443-10.1371/journal.pcbi.1000443. Othman R, Deris S, Illias R: A genetic similarity algorithm for searching the Gene Ontology terms and annotating anonymous protein sequences. Journal of biomedical informatics. 2008, 41: 65-81. 10.1016/j.jbi.2007.05.010. Yang H, Nepusz T, Paccanaro A: Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty. Bioinformatics. 2012, 28: 1383-1389. 10.1093/bioinformatics/bts129. Teng Z, Guo M, Liu X, Dai Q, Wang C, Xuan P: Measuring gene functional similarity based on group- wise comparison of GO terms. Bioinformatics. 2013, 29: 1424-1432. 10.1093/bioinformatics/btt160. Wu X, Pang E, Lin K, Pei Z: Improving the measurement of semantic similarity between gene ontology terms and gene products: Insights from an edge-and ic-based hybrid method. PloS one. 2013, 8: e66745-10.1371/journal.pone.0066745. Peng J, Wang Y, Chen J: Towards integrative gene functional similarity measurement. BMC bioinformatics. 2014, 15: S5- Resnik P: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research. 1999, 11: 95-130. Jiang J, Conrath D: Semantic similarity based on corpus statistics and lexical taxonomy. ROCLING. 1997, 9008- Lin D: An information-theoretic definition of similarity. CM. 1998, 98: 296-304. Sevilla J, Segura V, Podhorski A, Guruceaga E, Mato J, Martinez-Cruz L, Rubio A: Correlation between gene expression and GO semantic similarity. Computational Biology and Bioinformatics, IEEE/ACM Transactions on. 2005, 2: 330-338. 10.1109/TCBB.2005.50. Marler R, Arora J: The weighted sum method for multi-objective optimization: new insights. Structural and multidisciplinary optimization. 2010, 41: 853-862. 10.1007/s00158-009-0460-7. Glover F: Future paths for integer programming and links to artificial intelligence. Computers & Operations Research. 1986, 13: 533-549. 10.1016/0305-0548(86)90048-1. Karp P: Call for an enzyme genomics initiative. Genome biology. 2004, 5: 401-10.1186/gb-2004-5-8-401. Díaz-Mejía J, Pérez-Rueda E, Segovia L: A network perspective on the evolution of metabolism by gene duplication. Genome biology. 2007, 8: R26-10.1186/gb-2007-8-2-r26. Allison D, Cui X, Page G, Sabripour M: Microarray data analysis: from disarray to consolidation and consensus. Nature Reviews Genetics. 2006, 7: 55-65. 10.1038/nrg1749. Rhee S, Wood V, Dolinski K, Draghici S: Use and misuse of the gene ontology annotations. Nature Reviews Genetics. 2008, 9: 509-515. 10.1038/nrg2363. Gentleman R: Visualizing and distances using GO URL. [http://www.bioconductor.org/docs/vignettes.html] Lee H, Hsu A, Sajdak J, Qin J, Pavlidis P: Coexpression analysis of human genes across many microarray data sets. Genome research. 2004, 14: 1085-1094. 10.1101/gr.1910904. Pesquita C, Faria D, Bastos H, Falcao A, Couto F: Evaluating GO-based semantic similarity measures. Annual Bio-Ontologies Meeting. 2007, 37-40. Altschul S, Gish W, Miller W, Myers E, Lipman D: Basic local alignment search tool. Journal of molecular biology. 1990, 215: 403-410. 10.1016/S0022-2836(05)80360-2. Guengerich F: Cytochrome p450 and chemical toxicology. Chemical research in toxicology. 2007, 21: 70-83.