Solving the Problem: Genome Annotation Standards before the Data Deluge

Standards in Genomic Sciences - Tập 5 Số 1 - Trang 168-193
William Klimke1, Claire O’Donovan2, Owen White3, J. Rodney Brister1, Karen Clark1, Boris Fedorov1, Ilene Karsch-Mizrachi1, Kim D. Pruitt1, Tatiana Tatusova1,4,5
11The National Center for Biotechnology Information, National Library of Medicine, NIH, Building 45, Bethesda, MD 20894, USA
22UniProt, The EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
33Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
4Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
5UniProt, The EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

Tóm tắt

Từ khóa


Tài liệu tham khảo

Bork, 1992, Comprehensive sequence analysis of the 182 predicted open reading frames of yeast chromosome III., Protein Sci, 1, 1677, 10.1002/pro.5560011216

Bork, 1992, What's in a genome?, Nature, 358, 287, 10.1038/358287a0

Fleischmann, 1995, Whole-genome random sequencing and assembly of Haemophilus influenzae Rd., Science, 269, 496, 10.1126/science.7542800

White, 2010, Meeting Report: Towards a Critical Assessment of Functional Annotation Experiment (CAFAE) for bacterial genome annotation., Stand Genomic Sci, 3, 240, 10.4056/sigs.1323436

Ouzounis CA, Karp PD. The past, present and future of genome-wide re-annotation. Genome Biol 2002;3(2):COMMENT2001.

Ouzounis, 1995, New protein functions in yeast chromosome VIII., Protein Sci, 4, 2424, 10.1002/pro.5560041121

Kyrpides, 2009, Fifteen years of microbial genomics: meeting the challenges and fulfilling the dream., Nat Biotechnol, 27, 627, 10.1038/nbt.1552

Liolios K, Chen IM, Mavromatis K, Tavernarakis N, Hugenholtz P, Markowitz VM, Kyrpides NC. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res;38(Database issue):D346-54.

Fraser CM, Eisen JA, Nelson KE, Paulsen IT, Salzberg SL. The value of complete microbial genome sequencing (you get what you pay for). J Bacteriol 2002;184(23):6403-5; discusion 6405.

Metzker, 2010, Sequencing technologies - the next generation., Nat Rev Genet, 11, 31, 10.1038/nrg2626

Schnoes, 2009, Annotation error in public databases: misannotation of molecular function in enzyme superfamilies., PLOS Comput Biol, 5, e1000605, 10.1371/journal.pcbi.1000605

Dall'Olio, 2010, The annotation and the usage of scientific databases could be improved with public issue tracker software., Database (Oxford), 2010, baq035, 10.1093/database/baq035

Ussery, 2004, Genome Update: annotation quality in sequenced microbial genomes., Microbiology, 150, 2015, 10.1099/mic.0.27338-0

Andorf, 2007, Exploring inconsistencies in genome-wide protein function annotations: a machine learning approach., BMC Bioinformatics, 8, 284, 10.1186/1471-2105-8-284

Galperin, 2001, Novel domains of the prokaryotic two-component signal transduction systems., FEMS Microbiol Lett, 203, 11, 10.1111/j.1574-6968.2001.tb10814.x

Pei, 2001, GGDEF domain is homologous to adenylyl cyclase., Proteins, 42, 210, 10.1002/1097-0134(20010201)42:2<210::AID-PROT80>3.0.CO;2-8

Römling, 2005, C-di-GMP: the dawning of a novel bacterial signalling system., Mol Microbiol, 57, 629, 10.1111/j.1365-2958.2005.04697.x

Rentzsch, 2009, Protein function prediction--the power of multiplicity., Trends Biotechnol, 27, 210, 10.1016/j.tibtech.2009.01.002

Lowe, 1997, tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence., Nucleic Acids Res, 25, 955, 10.1093/nar/25.5.955

Lagesen, 2007, RNAmmer: consistent and rapid annotation of ribosomal RNA genes., Nucleic Acids Res, 35, 3100, 10.1093/nar/gkm160

Glasner, 2006, ASAP: a resource for annotating, curating, comparing, and disseminating genomic data., Nucleic Acids Res, 34, D41, 10.1093/nar/gkj164

Greene, 2007, National Institute of Allergy and Infectious Diseases bioinformatics resource centers: new assets for pathogen informatics., Infect Immun, 75, 3212, 10.1128/IAI.00105-07

Pruitt, 2009, NCBI Reference Sequences: current status, policy and new initiatives., Nucleic Acids Res, 37, D32, 10.1093/nar/gkn721

Klimke, 2009, The National Center for Biotechnology Information's Protein Clusters Database., Nucleic Acids Res, 37, D216, 10.1093/nar/gkn734

2009, The Universal Protein Resource (UniProt) 2009., Nucleic Acids Res, 37, D169, 10.1093/nar/gkn664

Kersey, 2004, Integr8 and Genome Reviews: integrated views of complete genomes and proteomes., Nucleic Acids Res, 33, D297, 10.1093/nar/gki039

Flicek P, Amode MR, Barrell D, Beal K, Brent S, Chen Y, Clapham P, Coates G, Fairley S, Fitzgerald S and others. Ensembl 2011. Nucleic Acids Res;39(Database issue):D800-6.

Brazma, 2001, Minimum information about a microarray experiment (MIAME)-toward standards for microarray data., Nat Genet, 29, 365, 10.1038/ng1201-365

Field, 2008, The minimum information about a genome sequence (MIGS) specification., Nat Biotechnol, 26, 541, 10.1038/nbt1360

Taylor, 2008, Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project., Nat Biotechnol, 26, 889, 10.1038/nbt.1411

Gaudet, 2011, Towards BioDBcore: a community-defined information specification for biological databases., Nucleic Acids Res, 39, D7, 10.1093/nar/gkq1173

Quackenbush, 2009, Data reporting standards: making the things we use better., Genome Med, 1, 111, 10.1186/gm111

Kaminuma, 2010, DDBJ launches a new archive database with analytical tools for next-generation sequence data., Nucleic Acids Res, 38, D33, 10.1093/nar/gkp847

Leinonen, 2011, The European Nucleotide Archive., Nucleic Acids Res, 39, D28, 10.1093/nar/gkq967

Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M. KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res 2007;35(Web Server issue):W182-5.

Aziz, 2008, The RAST Server: rapid annotations using subsystems technology., BMC Genomics, 9, 75, 10.1186/1471-2164-9-75

JGI website. http://www.jgi.doe.gov/

Goll, 2010, The Protein Naming Utility: a rules database for protein nomenclature., Nucleic Acids Res, 38, D336, 10.1093/nar/gkp958

Antonov, 2010, Genetack: frameshift identification in protein-coding sequences by the Viterbi algorithm., J Bioinform Comput Biol, 8, 535, 10.1142/S0219720010004847

Sayers, 2011, Database resources of the National Center for Biotechnology Information., Nucleic Acids Res, 39, D38, 10.1093/nar/gkq1172

Riley, 2006, Escherichia coli K-12: a cooperatively developed annotation snapshot--2005., Nucleic Acids Res, 34, 1, 10.1093/nar/gkj405

Siguier, 2006, ISfinder: the reference centre for bacterial insertion sequences., Nucleic Acids Res, 34, D32, 10.1093/nar/gkj014

Roberts, 2008, Revised nomenclature for transposable genetic elements., Plasmid, 60, 167, 10.1016/j.plasmid.2008.08.001

Tatusov, 2003, The COG database: an updated version includes eukaryotes., BMC Bioinformatics, 4, 41, 10.1186/1471-2105-4-41

Lima, 2009, HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot., Nucleic Acids Res, 37, D471, 10.1093/nar/gkn661

Aoki-Kinoshita, 2007, Gene annotation and pathway mapping in KEGG., Methods Mol Biol, 396, 71, 10.1007/978-1-59745-515-2_6

Selengut, 2007, TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes., Nucleic Acids Res, 35, D260, 10.1093/nar/gkl1043

Leplae, 2010, ACLAME: a CLAssification of Mobile genetic Elements, update 2010., Nucleic Acids Res, 38, D57, 10.1093/nar/gkp938

Genome Annotation WorkshopNCBI. http://www.ncbi.nlm.nih.gov/genomes/AnnotationWorkshop.html

Pruitt, 2009, The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes., Genome Res, 19, 1316, 10.1101/gr.080531.108

Blattner, 1997, The complete genome sequence of Escherichia coli K-12., Science, 277, 1453, 10.1126/science.277.5331.1453

Keseler, 2011, EcoCyc: a comprehensive database of Escherichia coli biology., Nucleic Acids Res, 39, D583, 10.1093/nar/gkq1143

Rudd, 2000, EcoGene: a genome sequence database for Escherichia coli K-12., Nucleic Acids Res, 28, 60, 10.1093/nar/28.1.60

Benson, 2011, GenBank., Nucleic Acids Res, 39, D32, 10.1093/nar/gkq1079

BioProject. http://www.ncbi.nlm.nih.gov/genomeprj

Angiuoli, 2008, Toward an online repository of Standard Operating Procedures (SOPs) for (meta)genomic annotation., OMICS, 12, 137, 10.1089/omi.2008.0017

Winsor, 2009, Pseudomonas Genome Database: facilitating user-friendly, comprehensive comparisons of microbial genomes., Nucleic Acids Res, 37, D483, 10.1093/nar/gkn861

The Gene Ontology in, 2010, extensions and refinements., Nucleic Acids Res, 38, D331, 10.1093/nar/gkp1018

Gil, 2004, Determination of the core of a minimal bacterial gene set., Microbiol Mol Biol Rev, 68, 518, 10.1128/MMBR.68.3.518-537.2004

Harris, 2003, The genetic core of the universal ancestor., Genome Res, 13, 407, 10.1101/gr.652803

Lipman, 2002, The relationship of protein conservation and sequence length., BMC Evol Biol, 2, 20, 10.1186/1471-2148-2-20

Giovannoni, 2005, Genome streamlining in a cosmopolitan oceanic bacterium., Science, 309, 1242, 10.1126/science.1114057

Nakabachi, 2006, The 160-kilobase genome of the bacterial endosymbiont Carsonella., Science, 314, 267, 10.1126/science.1134196

McCutcheon, 2009, Origin of an alternative genetic code in the extremely small and GC-rich genome of a bacterial symbiont., PLoS Genet, 5, e1000565, 10.1371/journal.pgen.1000565

Dufresne, 2005, Accelerated evolution associated with genome reduction in a free-living prokaryote., Genome Biol, 6, R14, 10.1186/gb-2005-6-2-r14

Rocap, 2003, Genome divergence in two Prochlorococcus ecotypes reflects oceanic niche differentiation., Nature, 424, 1042, 10.1038/nature01947

Willenbrock, 2005, Genome update: 2D clustering of bacterial genomes., Microbiology, 151, 333, 10.1099/mic.0.27811-0

Moran, 2009, The dynamics and time scale of ongoing genomic erosion in symbiotic bacteria., Science, 323, 379, 10.1126/science.1167140

Shigenobu, 2000, Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS., Nature, 407, 81, 10.1038/35024074

Shen, 2010, Complete genome sequences of Yersinia pestis from natural foci in China., J Bacteriol, 192, 3551, 10.1128/JB.00340-10

Jeong, 2009, Genome sequences of Escherichia coli B strains REL606 and BL21(DE3)., J Mol Biol, 394, 644, 10.1016/j.jmb.2009.09.052

Karro, 2007, Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation., Nucleic Acids Res, 35, D55, 10.1093/nar/gkl851

Liu, 2004, Comprehensive analysis of pseudogenes in prokaryotes: widespread gene decay and failure of putative horizontally transferred genes., Genome Biol, 5, R64, 10.1186/gb-2004-5-9-r64

Kuo, 2010, The extinction dynamics of bacterial pseudogenes., PLoS Genet, 6, 10.1371/journal.pgen.1001050

Okuda S, Yamada T, Hamajima M, Itoh M, Katayama T, Bork P, Goto S, Kanehisa M. KEGG Atlas mapping for global analysis of metabolic pathways. Nucleic Acids Res 2008;36(Web Server issue):W423-6.

Koonin, 2008, Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world., Nucleic Acids Res, 36, 6688, 10.1093/nar/gkn668

Tettelin, 2005, Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome"., Proc Natl Acad Sci USA, 102, 13950, 10.1073/pnas.0506758102

Hunter, 2009, InterPro: the integrative protein signature database., Nucleic Acids Res, 37, D211, 10.1093/nar/gkn785

Brister, 2010, Towards Viral Genome Annotation Standards, Report from the 2010 NCBI Annotation Workshop., Viruses, 2, 2258, 10.3390/v2102258

Roberts, 2011, COMBREX: a project to accelerate the functional annotation of prokaryotic genomes., Nucleic Acids Res, 39, D11, 10.1093/nar/gkq1168