An integrative approach for measuring semantic similarities using gene ontology
Tóm tắt
Gene Ontology (GO) provides rich information and a convenient way to study gene functional similarity, which has been successfully used in various applications. However, the existing GO based similarity measurements have limited functions for only a subset of GO information is considered in each measure. An appropriate integration of the existing measures to take into account more information in GO is demanding. We propose a novel integrative measure called InteGO 2 to automatically select appropriate seed measures and then to integrate them using a metaheuristic search method. The experiment results show that InteGO 2 significantly improves the performance of gene similarity in human, Arabidopsis and yeast on both molecular function and biological process GO categories. InteGO 2 computes gene-to-gene similarities more accurately than tested existing measures and has high robustness. The supplementary document and software are available at
http://mlg.hit.edu.cn:8082/
.
Tài liệu tham khảo
Consortium GO: Gene Ontology annotations and resources. Nucleic acids research. 2013, 41: D530-D535.
Blake J: Ten quick tips for using the gene ontology. PLoS computational biology. 2013, 9: e1003343-10.1371/journal.pcbi.1003343.
Vafaee F, Rosu D, Broackes-Carter F, Jurisica I: Novel semantic similarity measure improves an integrative approach to predicting gene functional associations. BMC systems biology. 2013, 7: 22-10.1186/1752-0509-7-22.
Nehrt N, Clark W, Radivojac P, Hahn M: Testing the ortholog conjecture with comparative functional genomic data from mammals. PLoS computational biology. 2011, 7: e1002073-10.1371/journal.pcbi.1002073.
Lewis B, Shih I, Jones-Rhoades M, Bartel D, Burge C: Prediction of mammalian microRNA targets. Cell. 2003, 115: 787-798. 10.1016/S0092-8674(03)01018-3.
Lu Z, Hunter L: GO molecular function terms are predictive of subcellular localization. PSB. 151-
Lord P, Stevens R, Brass A, Goble C: Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics. 2003, 19: 1275-1283. 10.1093/bioinformatics/btg153.
Cheng J, Cline M, Martin J, Finkelstein D, Awad T, Kulp D, Siani-Rose M: A knowledge-based clustering algorithm driven by gene ontology. Journal of biopharmaceutical statistics. 2004, 14: 687-700. 10.1081/BIP-200025659.
Couto F, Silva M, Coutinho P: Semantic similarity over the gene ontology: family correlation and selecting disjunctive ancestors. CIKM. 2005, 343-344.
Bodenreider O, Aubry M, Burgun A: Non-lexical approaches to identifying associative relations in the gene ontology. PSB. 2005, 91-
Wu H, Su Z, Mao F, Olman V, Xu Y: Prediction of functional modules based on comparative genome analysis and Gene Ontology application. Nucleic acids research. 2005, 33: 2822-2837. 10.1093/nar/gki573.
Yu H, Gao L, Tu K, Guo Z: Broadly predicting specific gene functions with expression similarity and taxonomy similarity. Gene. 2005, 352: 75-81.
Schlicker A, Domingues F, Rahnenfhrer J, Lengauer T: A new measure for functional similarity of gene products based on Gene Ontology. BMC bioinformatics. 2006, 7: 302-10.1186/1471-2105-7-302.
Riensche R, Baddeley B, Sanfilippo A, Posse C, Gopalan B: Xoa: Web-enabled cross-ontological analytics. IEEE Congress on Services. 2007, 99-105.
Wang J, Du Z, Payattakool R, Philip S, Chen C: A new method to measure the semantic similarity of GO terms. Bioinformatics. 2007, 23: 1274-1281. 10.1093/bioinformatics/btm087.
Yu H, Jansen R, Stolovitzky G, Gerstein M: Total ancestry measure: quantifying the similarity in tree- like classification, with genomic applications. Bioinformatics. 2007, 23: 2163-2173. 10.1093/bioinformatics/btm291.
del Pozo A, Pazos F, Valencia A: Defining functional distances over Gene Ontology. BMC bioinformatics. 2008, 9: 50-10.1186/1471-2105-9-50.
Pesquita C, Faria D, Falcao A, Lord P, Couto F: Semantic similarity in biomedical ontologies. PLoS computational biology. 2009, 5: e1000443-10.1371/journal.pcbi.1000443.
Othman R, Deris S, Illias R: A genetic similarity algorithm for searching the Gene Ontology terms and annotating anonymous protein sequences. Journal of biomedical informatics. 2008, 41: 65-81. 10.1016/j.jbi.2007.05.010.
Yang H, Nepusz T, Paccanaro A: Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty. Bioinformatics. 2012, 28: 1383-1389. 10.1093/bioinformatics/bts129.
Teng Z, Guo M, Liu X, Dai Q, Wang C, Xuan P: Measuring gene functional similarity based on group- wise comparison of GO terms. Bioinformatics. 2013, 29: 1424-1432. 10.1093/bioinformatics/btt160.
Wu X, Pang E, Lin K, Pei Z: Improving the measurement of semantic similarity between gene ontology terms and gene products: Insights from an edge-and ic-based hybrid method. PloS one. 2013, 8: e66745-10.1371/journal.pone.0066745.
Peng J, Wang Y, Chen J: Towards integrative gene functional similarity measurement. BMC bioinformatics. 2014, 15: S5-
Resnik P: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research. 1999, 11: 95-130.
Jiang J, Conrath D: Semantic similarity based on corpus statistics and lexical taxonomy. ROCLING. 1997, 9008-
Lin D: An information-theoretic definition of similarity. CM. 1998, 98: 296-304.
Sevilla J, Segura V, Podhorski A, Guruceaga E, Mato J, Martinez-Cruz L, Rubio A: Correlation between gene expression and GO semantic similarity. Computational Biology and Bioinformatics, IEEE/ACM Transactions on. 2005, 2: 330-338. 10.1109/TCBB.2005.50.
Marler R, Arora J: The weighted sum method for multi-objective optimization: new insights. Structural and multidisciplinary optimization. 2010, 41: 853-862. 10.1007/s00158-009-0460-7.
Glover F: Future paths for integer programming and links to artificial intelligence. Computers & Operations Research. 1986, 13: 533-549. 10.1016/0305-0548(86)90048-1.
Karp P: Call for an enzyme genomics initiative. Genome biology. 2004, 5: 401-10.1186/gb-2004-5-8-401.
Díaz-Mejía J, Pérez-Rueda E, Segovia L: A network perspective on the evolution of metabolism by gene duplication. Genome biology. 2007, 8: R26-10.1186/gb-2007-8-2-r26.
Allison D, Cui X, Page G, Sabripour M: Microarray data analysis: from disarray to consolidation and consensus. Nature Reviews Genetics. 2006, 7: 55-65. 10.1038/nrg1749.
Rhee S, Wood V, Dolinski K, Draghici S: Use and misuse of the gene ontology annotations. Nature Reviews Genetics. 2008, 9: 509-515. 10.1038/nrg2363.
Gentleman R: Visualizing and distances using GO URL. [http://www.bioconductor.org/docs/vignettes.html]
Lee H, Hsu A, Sajdak J, Qin J, Pavlidis P: Coexpression analysis of human genes across many microarray data sets. Genome research. 2004, 14: 1085-1094. 10.1101/gr.1910904.
Pesquita C, Faria D, Bastos H, Falcao A, Couto F: Evaluating GO-based semantic similarity measures. Annual Bio-Ontologies Meeting. 2007, 37-40.
Altschul S, Gish W, Miller W, Myers E, Lipman D: Basic local alignment search tool. Journal of molecular biology. 1990, 215: 403-410. 10.1016/S0022-2836(05)80360-2.
Guengerich F: Cytochrome p450 and chemical toxicology. Chemical research in toxicology. 2007, 21: 70-83.