Solving the Problem: Genome Annotation Standards before the Data Deluge
Tóm tắt
Từ khóa
Tài liệu tham khảo
Bork, 1992, Comprehensive sequence analysis of the 182 predicted open reading frames of yeast chromosome III., Protein Sci, 1, 1677, 10.1002/pro.5560011216
Fleischmann, 1995, Whole-genome random sequencing and assembly of Haemophilus influenzae Rd., Science, 269, 496, 10.1126/science.7542800
White, 2010, Meeting Report: Towards a Critical Assessment of Functional Annotation Experiment (CAFAE) for bacterial genome annotation., Stand Genomic Sci, 3, 240, 10.4056/sigs.1323436
Ouzounis CA, Karp PD. The past, present and future of genome-wide re-annotation. Genome Biol 2002;3(2):COMMENT2001.
Ouzounis, 1995, New protein functions in yeast chromosome VIII., Protein Sci, 4, 2424, 10.1002/pro.5560041121
Kyrpides, 2009, Fifteen years of microbial genomics: meeting the challenges and fulfilling the dream., Nat Biotechnol, 27, 627, 10.1038/nbt.1552
Liolios K, Chen IM, Mavromatis K, Tavernarakis N, Hugenholtz P, Markowitz VM, Kyrpides NC. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res;38(Database issue):D346-54.
Fraser CM, Eisen JA, Nelson KE, Paulsen IT, Salzberg SL. The value of complete microbial genome sequencing (you get what you pay for). J Bacteriol 2002;184(23):6403-5; discusion 6405.
Metzker, 2010, Sequencing technologies - the next generation., Nat Rev Genet, 11, 31, 10.1038/nrg2626
Schnoes, 2009, Annotation error in public databases: misannotation of molecular function in enzyme superfamilies., PLOS Comput Biol, 5, e1000605, 10.1371/journal.pcbi.1000605
Dall'Olio, 2010, The annotation and the usage of scientific databases could be improved with public issue tracker software., Database (Oxford), 2010, baq035, 10.1093/database/baq035
Ussery, 2004, Genome Update: annotation quality in sequenced microbial genomes., Microbiology, 150, 2015, 10.1099/mic.0.27338-0
Andorf, 2007, Exploring inconsistencies in genome-wide protein function annotations: a machine learning approach., BMC Bioinformatics, 8, 284, 10.1186/1471-2105-8-284
Galperin, 2001, Novel domains of the prokaryotic two-component signal transduction systems., FEMS Microbiol Lett, 203, 11, 10.1111/j.1574-6968.2001.tb10814.x
Pei, 2001, GGDEF domain is homologous to adenylyl cyclase., Proteins, 42, 210, 10.1002/1097-0134(20010201)42:2<210::AID-PROT80>3.0.CO;2-8
Römling, 2005, C-di-GMP: the dawning of a novel bacterial signalling system., Mol Microbiol, 57, 629, 10.1111/j.1365-2958.2005.04697.x
Rentzsch, 2009, Protein function prediction--the power of multiplicity., Trends Biotechnol, 27, 210, 10.1016/j.tibtech.2009.01.002
Lowe, 1997, tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence., Nucleic Acids Res, 25, 955, 10.1093/nar/25.5.955
Lagesen, 2007, RNAmmer: consistent and rapid annotation of ribosomal RNA genes., Nucleic Acids Res, 35, 3100, 10.1093/nar/gkm160
Glasner, 2006, ASAP: a resource for annotating, curating, comparing, and disseminating genomic data., Nucleic Acids Res, 34, D41, 10.1093/nar/gkj164
Greene, 2007, National Institute of Allergy and Infectious Diseases bioinformatics resource centers: new assets for pathogen informatics., Infect Immun, 75, 3212, 10.1128/IAI.00105-07
Pruitt, 2009, NCBI Reference Sequences: current status, policy and new initiatives., Nucleic Acids Res, 37, D32, 10.1093/nar/gkn721
Klimke, 2009, The National Center for Biotechnology Information's Protein Clusters Database., Nucleic Acids Res, 37, D216, 10.1093/nar/gkn734
2009, The Universal Protein Resource (UniProt) 2009., Nucleic Acids Res, 37, D169, 10.1093/nar/gkn664
Kersey, 2004, Integr8 and Genome Reviews: integrated views of complete genomes and proteomes., Nucleic Acids Res, 33, D297, 10.1093/nar/gki039
Flicek P, Amode MR, Barrell D, Beal K, Brent S, Chen Y, Clapham P, Coates G, Fairley S, Fitzgerald S and others. Ensembl 2011. Nucleic Acids Res;39(Database issue):D800-6.
Brazma, 2001, Minimum information about a microarray experiment (MIAME)-toward standards for microarray data., Nat Genet, 29, 365, 10.1038/ng1201-365
Field, 2008, The minimum information about a genome sequence (MIGS) specification., Nat Biotechnol, 26, 541, 10.1038/nbt1360
Taylor, 2008, Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project., Nat Biotechnol, 26, 889, 10.1038/nbt.1411
Gaudet, 2011, Towards BioDBcore: a community-defined information specification for biological databases., Nucleic Acids Res, 39, D7, 10.1093/nar/gkq1173
Quackenbush, 2009, Data reporting standards: making the things we use better., Genome Med, 1, 111, 10.1186/gm111
Kaminuma, 2010, DDBJ launches a new archive database with analytical tools for next-generation sequence data., Nucleic Acids Res, 38, D33, 10.1093/nar/gkp847
Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M. KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res 2007;35(Web Server issue):W182-5.
Aziz, 2008, The RAST Server: rapid annotations using subsystems technology., BMC Genomics, 9, 75, 10.1186/1471-2164-9-75
JGI website. http://www.jgi.doe.gov/
Goll, 2010, The Protein Naming Utility: a rules database for protein nomenclature., Nucleic Acids Res, 38, D336, 10.1093/nar/gkp958
Antonov, 2010, Genetack: frameshift identification in protein-coding sequences by the Viterbi algorithm., J Bioinform Comput Biol, 8, 535, 10.1142/S0219720010004847
Sayers, 2011, Database resources of the National Center for Biotechnology Information., Nucleic Acids Res, 39, D38, 10.1093/nar/gkq1172
Riley, 2006, Escherichia coli K-12: a cooperatively developed annotation snapshot--2005., Nucleic Acids Res, 34, 1, 10.1093/nar/gkj405
Siguier, 2006, ISfinder: the reference centre for bacterial insertion sequences., Nucleic Acids Res, 34, D32, 10.1093/nar/gkj014
Roberts, 2008, Revised nomenclature for transposable genetic elements., Plasmid, 60, 167, 10.1016/j.plasmid.2008.08.001
Tatusov, 2003, The COG database: an updated version includes eukaryotes., BMC Bioinformatics, 4, 41, 10.1186/1471-2105-4-41
Lima, 2009, HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot., Nucleic Acids Res, 37, D471, 10.1093/nar/gkn661
Aoki-Kinoshita, 2007, Gene annotation and pathway mapping in KEGG., Methods Mol Biol, 396, 71, 10.1007/978-1-59745-515-2_6
Selengut, 2007, TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes., Nucleic Acids Res, 35, D260, 10.1093/nar/gkl1043
Leplae, 2010, ACLAME: a CLAssification of Mobile genetic Elements, update 2010., Nucleic Acids Res, 38, D57, 10.1093/nar/gkp938
Genome Annotation WorkshopNCBI. http://www.ncbi.nlm.nih.gov/genomes/AnnotationWorkshop.html
Pruitt, 2009, The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes., Genome Res, 19, 1316, 10.1101/gr.080531.108
Blattner, 1997, The complete genome sequence of Escherichia coli K-12., Science, 277, 1453, 10.1126/science.277.5331.1453
Keseler, 2011, EcoCyc: a comprehensive database of Escherichia coli biology., Nucleic Acids Res, 39, D583, 10.1093/nar/gkq1143
Rudd, 2000, EcoGene: a genome sequence database for Escherichia coli K-12., Nucleic Acids Res, 28, 60, 10.1093/nar/28.1.60
BioProject. http://www.ncbi.nlm.nih.gov/genomeprj
Angiuoli, 2008, Toward an online repository of Standard Operating Procedures (SOPs) for (meta)genomic annotation., OMICS, 12, 137, 10.1089/omi.2008.0017
Winsor, 2009, Pseudomonas Genome Database: facilitating user-friendly, comprehensive comparisons of microbial genomes., Nucleic Acids Res, 37, D483, 10.1093/nar/gkn861
The Gene Ontology in, 2010, extensions and refinements., Nucleic Acids Res, 38, D331, 10.1093/nar/gkp1018
Gil, 2004, Determination of the core of a minimal bacterial gene set., Microbiol Mol Biol Rev, 68, 518, 10.1128/MMBR.68.3.518-537.2004
Lipman, 2002, The relationship of protein conservation and sequence length., BMC Evol Biol, 2, 20, 10.1186/1471-2148-2-20
Giovannoni, 2005, Genome streamlining in a cosmopolitan oceanic bacterium., Science, 309, 1242, 10.1126/science.1114057
Nakabachi, 2006, The 160-kilobase genome of the bacterial endosymbiont Carsonella., Science, 314, 267, 10.1126/science.1134196
McCutcheon, 2009, Origin of an alternative genetic code in the extremely small and GC-rich genome of a bacterial symbiont., PLoS Genet, 5, e1000565, 10.1371/journal.pgen.1000565
Dufresne, 2005, Accelerated evolution associated with genome reduction in a free-living prokaryote., Genome Biol, 6, R14, 10.1186/gb-2005-6-2-r14
Rocap, 2003, Genome divergence in two Prochlorococcus ecotypes reflects oceanic niche differentiation., Nature, 424, 1042, 10.1038/nature01947
Willenbrock, 2005, Genome update: 2D clustering of bacterial genomes., Microbiology, 151, 333, 10.1099/mic.0.27811-0
Moran, 2009, The dynamics and time scale of ongoing genomic erosion in symbiotic bacteria., Science, 323, 379, 10.1126/science.1167140
Shigenobu, 2000, Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS., Nature, 407, 81, 10.1038/35024074
Shen, 2010, Complete genome sequences of Yersinia pestis from natural foci in China., J Bacteriol, 192, 3551, 10.1128/JB.00340-10
Jeong, 2009, Genome sequences of Escherichia coli B strains REL606 and BL21(DE3)., J Mol Biol, 394, 644, 10.1016/j.jmb.2009.09.052
Karro, 2007, Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation., Nucleic Acids Res, 35, D55, 10.1093/nar/gkl851
Liu, 2004, Comprehensive analysis of pseudogenes in prokaryotes: widespread gene decay and failure of putative horizontally transferred genes., Genome Biol, 5, R64, 10.1186/gb-2004-5-9-r64
Kuo, 2010, The extinction dynamics of bacterial pseudogenes., PLoS Genet, 6, 10.1371/journal.pgen.1001050
Okuda S, Yamada T, Hamajima M, Itoh M, Katayama T, Bork P, Goto S, Kanehisa M. KEGG Atlas mapping for global analysis of metabolic pathways. Nucleic Acids Res 2008;36(Web Server issue):W423-6.
Koonin, 2008, Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world., Nucleic Acids Res, 36, 6688, 10.1093/nar/gkn668
Tettelin, 2005, Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome"., Proc Natl Acad Sci USA, 102, 13950, 10.1073/pnas.0506758102
Hunter, 2009, InterPro: the integrative protein signature database., Nucleic Acids Res, 37, D211, 10.1093/nar/gkn785
Brister, 2010, Towards Viral Genome Annotation Standards, Report from the 2010 NCBI Annotation Workshop., Viruses, 2, 2258, 10.3390/v2102258