DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication

Bioinformatics (Oxford, England) - Tập 34 Số 6 - Trang 1037-1039 - 2018
Yasuhiro Tanizawa1, Takatomo Fujisawa1, Yasukazu Nakamura1
1Center for Information Biology, National Institute of Genetics, Research Organization of Information and Systems, 1111 Yata, Mishima, Japan

Tóm tắt

Abstract Summary

We developed a prokaryotic genome annotation pipeline, DFAST, that also supports genome submission to public sequence databases. DFAST was originally started as an on-line annotation server, and to date, over 7000 jobs have been processed since its first launch in 2016. Here, we present a newly implemented background annotation engine for DFAST, which is also available as a standalone command-line program. The new engine can annotate a typical-sized bacterial genome within 10 min, with rich information such as pseudogenes, translation exceptions and orthologous gene assignment between given reference genomes. In addition, the modular framework of DFAST allows users to customize the annotation workflow easily and will also facilitate extensions for new functions and incorporation of new tools in the future.

Availability and implementation

The software is implemented in Python 3 and runs in both Python 2.7 and 3.4—on Macintosh and Linux systems. It is freely available at https://github.com/nigyta/dfast_core/under the GPLv3 license with external binaries bundled in the software distribution. An on-line version is also available at https://dfast.nig.ac.jp/.

Supplementary information

Supplementary data are available at Bioinformatics online.

Từ khóa


Tài liệu tham khảo

Cochrane, 2016, The International Nucleotide Sequence Database Collaboration, Nucleic Acids Res, 44, D48, 10.1093/nar/gkv1323

Haft, 2013, TIGRFAMs and genome properties in 2013, Nucleic Acids Res, 41, D387, 10.1093/nar/gks1234

Kiełbasa, 2011, Adaptive seeds tame genomic sequence comparison, Genome Res, 21, 487, 10.1101/gr.113985.110

Marchler-Bauer, 2017, CDD/SPARCLE: functional classification of proteins via subfamily domain architectures, Nucleic Acids Res, 45, D200, 10.1093/nar/gkw1129

Mashima, 2017, DNA Data Bank of Japan, Nucleic Acids Res, 45, D25, 10.1093/nar/gkw1001

Seemann, 2014, Prokka: rapid prokaryotic genome annotation, Bioinformatics, 30, 2068, 10.1093/bioinformatics/btu153

Sugawara, 2009

Suzuki, 2014, GHOSTX: an improved sequence homology search algorithm using a query suffix array and a database suffix array, PLoS ONE, 9, e103833, 10.1371/journal.pone.0103833

Tanizawa, 2016, DFAST and DAGA: Web-based integrated genome annotation tools and resources, BMFH, 35, 173

Tatusova, 2016, NCBI prokaryotic genome annotation pipeline, Nucleic Acids Res, 44, 6614, 10.1093/nar/gkw569