A task-based approach for Gene Ontology evaluation

Journal of Biomedical Semantics - Tập 4 - Trang 1-11 - 2013
Erik L Clarke1, Salvatore Loguercio1, Benjamin M Good1, Andrew I Su1
1The Scripps Research Institute, La Jolla, USA

Tóm tắt

The Gene Ontology and its associated annotations are critical tools for interpreting lists of genes. Here, we introduce a method for evaluating the Gene Ontology annotations and structure based on the impact they have on gene set enrichment analysis, along with an example implementation. This task-based approach yields quantitative assessments grounded in experimental data and anchored tightly to the primary use of the annotations. Applied to specific areas of biological interest, our framework allowed us to understand the progress of annotation and structural ontology changes from 2004 to 2012. Our framework was also able to determine that the quality of annotations and structure in the area under test have been improving in their ability to recall underlying biological traits. Furthermore, we were able to distinguish between the impact of changes to the annotation sets and ontology structure. Our framework and implementation lay the groundwork for a powerful tool in evaluating the usefulness of the Gene Ontology. We demonstrate both the flexibility and the power of this approach in evaluating the current and past state of the Gene Ontology as well as its applicability in developing new methods for creating gene annotations.

Tài liệu tham khảo

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature genetics. 2000, 25: 25-9. 10.1038/75556. Dimmer EC, Huntley RP, Alam-Faruque Y, Sawford T, O’Donovan C, Martin MJ, Bely B, Browne P, Mun Chan W, Eberhardt R, Gardner M, Laiho K, Legge D, Magrane M, Pichler K, Poggioli D, Sehra H, Auchincloss A, Axelsen K, Blatter MC, Boutet E, Braconi-Quintaje S, Breuza L, Bridge A, Coudert E, Estreicher A, Famiglietti L, Ferro-Rojas S, Feuermann M, Gos A: The UniProt-GO Annotation database in 2011. Nucleic acids research. 2012, 40 (Database issue): D565-70. Human Gene Associations, Revision 1.232. [http://cvsweb.geneontology.org/cgi-bin/cvsweb.cgi/~checkout~/go/gene-associations/gene_association.goa_human.gz?rev=1.232;content-type=application%2Fx-gzip] Schnoes AM, Brown SD, Dodevski I, Babbitt PC: Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLoS computational biology. 2009, 5: e1000605-10.1371/journal.pcbi.1000605. Skunca N, Altenhoff A, Dessimoz C: Quality of computationally inferred gene ontology annotations. PLoS computational biology. 2012, 8: e1002533-10.1371/journal.pcbi.1002533. Jones CE, Brown AL, Baumann U: Estimating the annotation error rate of curated GO database sequence annotations. BMC bioinformatics. 2007, 8: 170-10.1186/1471-2105-8-170. Buza TJ, McCarthy FM, Wang N, Bridges SM, Burgess SC: Gene Ontology annotation quality analysis in model eukaryotes. Nucleic acids research. 2008, 36: e12- Gross A, Hartung M, Kirsten T, Rahm E: Estimating the Quality of Ontology-Based Annotations by Considering Evolutionary Changes. 2009, 5647: 71-87. Porzel R, Malaka R: A Task-based Approach for Ontology Evaluation. Test. 2004, 9-16. Huang DW, Sherman BT, Lempicki RA: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic acids research. 2009, 37: 1-13. 10.1093/nar/gkn923. Khatri P, Drăghici S: Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics (Oxford, England). 2005, 21: 3587-95. 10.1093/bioinformatics/bti565. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America. 2005, 102: 15545-50. 10.1073/pnas.0506580102. GO Evaluation Suite. [http://bitbucket.org/sulab/go-evaluation] Ricci-Vitiani L, Pallini R, Biffoni M, Todaro M, Invernici G, Cenci T, Maira G, Parati EA, Stassi G, Larocca LM, De Maria R: Tumour vascularization via endothelial differentiation of glioblastoma stem-like cells. Nature. 2010, 468: 824-8. 10.1038/nature09557. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Research. 2002, 30: 207-210. 10.1093/nar/30.1.207. Sun L, Hui AM, Su Q, Vortmeyer A, Kotliarov Y, Pastorino S, Passaniti A, Menon J, Walling J, Bailey R, Rosenblum M, Mikkelsen T, Fine HA: Neuronal and glioma-derived stem cell factor induces angiogenesis within the brain. Cancer cell. 2006, 9: 287-300. 10.1016/j.ccr.2006.03.003. Gene Ontology CVS. [http://cvsweb.geneontology.org/cgi-bin/cvsweb.cgi/go/] Gene Ontology Annotations CVS. [http://cvsweb.geneontology.org/cgi-bin/cvsweb.cgi/go/gene-associations/] Groß A, Hartung M, Prüfer K, Kelso J, Rahm E: Impact of ontology evolution on functional analyses. Bioinformatics (Oxford, England). 2012, 28: 2671-7. 10.1093/bioinformatics/bts498. Alam-Faruque Y, Huntley RP, Khodiyar VK, Camon EB, Dimmer EC, Sawford T, Martin MJ, O’Donovan C, Talmud PJ, Scambler P, Apweiler R, Lovering RC: The Impact of Focused Gene Ontology Curation of Specific Mammalian Systems. PLoS ONE. 2011, 6: e27541-10.1371/journal.pone.0027541. Man MZ, Wang X, Wang Y: POWER_SAGE: comparing statistical tests for SAGE experiments. Bioinformatics. 2000, 16: 953-959. 10.1093/bioinformatics/16.11.953.