The RAST Server: Rapid Annotations using Subsystems Technology

Springer Science and Business Media LLC - Tập 9 - Trang 1-15 - 2008
Ramy K Aziz1,2, Daniela Bartels3, Aaron A Best4, Matthew DeJongh4, Terrence Disz5,3, Robert A Edwards6,5, Kevin Formsma4, Svetlana Gerdes6, Elizabeth M Glass5, Michael Kubal3, Folker Meyer5,3, Gary J Olsen7,5, Robert Olson5,3, Andrei L Osterman6,8, Ross A Overbeek6, Leslie K McNeil9, Daniel Paarmann3, Tobias Paczian3, Bruce Parrello6, Gordon D Pusch6,3, Claudia Reich9, Rick Stevens5,3, Olga Vassieva6, Veronika Vonstein6, Andreas Wilke3, Olga Zagnitko6
1University of Tennessee Health Science Center, Memphis, USA
2Department of Microbiology and Immunology, Cairo University, Cairo, Egypt
3Computation Institute, University of Chicago, Chicago, USA
4Hope College, Holland, USA
5Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, USA
6Fellowship for Interpretation of Genomes, Burr Ridge, USA
7Department of Microbiology, University of Illinois at Urbana-Champaign, Urbana, USA
8The Burnham Institute, San Diego, USA
9National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, USA

Tóm tắt

The number of prokaryotic genome sequences becoming available is growing steadily and is growing faster than our ability to accurately annotate them. We describe a fully automated service for annotating bacterial and archaeal genomes. The service identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user. In addition, the annotated genome can be browsed in an environment that supports comparative analysis with the annotated genomes maintained in the SEED environment. The service normally makes the annotated genome available within 12–24 hours of submission, but ultimately the quality of such a service will be judged in terms of accuracy, consistency, and completeness of the produced annotations. We summarize our attempts to address these issues and discuss plans for incrementally enhancing the service. By providing accurate, rapid annotation freely to the community we have created an important community resource. The service has now been utilized by over 120 external users annotating over 350 distinct genomes.

Tài liệu tham khảo

Meyer F, Goesmann A, McHardy AC, Bartels D, Bekel T, Clausen J, Kalinowski J, Linke B, Rupp O, Giegerich R, et al: GenDB – an open source genome annotation system for prokaryote genomes. Nucleic Acids Res. 2003, 31 (8): 2187-2195. 10.1093/nar/gkg312.

Van Domselaar GH, Stothard P, Shrivastava S, Cruz JA, Guo A, Dong X, Lu P, Szafron D, Greiner R, Wishart DS: BASys: a web server for automated bacterial genome annotation. Nucleic Acids Res. 2005, W455-459. 10.1093/nar/gki593. 33 Web Server

Bryson K, Loux V, Bossy R, Nicolas P, Chaillou S, van de Guchte M, Penaud S, Maguin E, Hoebeke M, Bessieres P, et al: AGMIAL: implementing an annotation strategy for prokaryote genomes as a distributed system. Nucleic Acids Res. 2006, 34 (12): 3533-3545. 10.1093/nar/gkl471.

Vallenet D, Labarre L, Rouy Z, Barbe V, Bocs S, Cruveiller S, Lajus A, Pascal G, Scarpelli C, Medigue C: MaGe: a microbial genome annotation system supported by synteny results. Nucleic Acids Res. 2006, 34 (1): 53-65. 10.1093/nar/gkj406.

Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M: KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 2007, W182-185. 10.1093/nar/gkm321. 35 Web Server

Manatee. [http://manatee.sourceforge.net]

Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, de Crecy-Lagard V, Diaz N, Disz T, Edwards R, et al: The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 2005, 33 (17): 5691-5702. 10.1093/nar/gki866.

The SEED framework for comparative genomics. [http://www.theseed.org]

The Project to Annotate 1000 Genomes. [http://www.theSEED.org/wiki/Annotating_1000_genomes]

Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EV: The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 2001, 29 (1): 22-28. 10.1093/nar/29.1.22.

Wu CH, Nikolskaya A, Huang H, Yeh LS, Natale DA, Vinayaka CR, Hu ZZ, Mazumder R, Kumar S, Kourtesis P: PIRSF: family classification system at the Protein Information Resource. Nucleic Acids Res. 2004, D112-114. 10.1093/nar/gkh097. 32 Database

Kanehisa M, Goto S: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000, 28 (1): 27-30. 10.1093/nar/28.1.27.

Haft DH, Loftus BJ, Richardson DL, Yang F, Eisen JA, Paulsen IT, White O: TIGRFAMs: a protein family resource for the functional identification of proteins. Nucleic Acids Res. 2001, 29 (1): 41-43. 10.1093/nar/29.1.41.

Overbeek R, Bartels D, Vonstein V, Meyer F: Annotation of bacterial and archaeal genomes: improving accuracy and consistency. Chem Rev. 2007, 107 (8): 3431-3447. 10.1021/cr068308h.

Lowe TM, Eddy SR: tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997, 25 (5): 955-964. 10.1093/nar/25.5.955.

Delcher AL, Harmon D, Kasif S, White O, Salzberg SL: Improved microbial gene identification with GLIMMER. Nucleic Acids Res. 1999, 27 (23): 4636-4641. 10.1093/nar/27.23.4636.

KAAS – KEGG Automatic Annotation Server. [http://www.genome.jp/kegg/kaas/]

The Annotation Clearinghouse. [http://clearinghouse.nmpdr.org]

TIGR's Comprehensive Microbial Resource. [http://cmr.tigr.org]

Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL: GenBank. Nucleic Acids Res. 2007, D21-25. 10.1093/nar/gkl986. 35 Database

The metagenomics RAST server. [http://metagenomics.nmpdr.org]