GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database

Bioinformatics - Tập 36 Số 6 - Trang 1925-1927 - 2020
Pierre-Alain Chaumeil1, Aaron J. Mussig1, Philip Hugenholtz1, Donovan H. Parks1
1Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia, QLD 4072, Australia

Tóm tắt

Abstract Summary The Genome Taxonomy Database Toolkit (GTDB-Tk) provides objective taxonomic assignments for bacterial and archaeal genomes based on the GTDB. GTDB-Tk is computationally efficient and able to classify thousands of draft genomes in parallel. Here we demonstrate the accuracy of the GTDB-Tk taxonomic assignments by evaluating its performance on a phylogenetically diverse set of 10 156 bacterial and archaeal metagenome-assembled genomes. Availability and implementation GTDB-Tk is implemented in Python and licenced under the GNU General Public Licence v3.0. Source code and documentation are available at: https://github.com/ecogenomics/gtdbtk. Supplementary information Supplementary data are available at Bioinformatics online.

Từ khóa


Tài liệu tham khảo

Anantharaman, 2016, Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system, Nat. Commun, 7, 13219, 10.1038/ncomms13219

Arkin, 2018, KBase: the United States Department of Energy Systems Biology Knowledgebase, Nat. Biotechnol, 36, 566, 10.1038/nbt.4163

Bowers, 2017, Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea, Nat. Biotechnol, 35, 725, 10.1038/nbt.3893

Coil, 2019, Genomes from bacteria associated with the canine oral cavity: a test case for automated genome-based taxonomic assignment, PLoS One, 14, e0214354, 10.1371/journal.pone.0214354

Eddy, 2011, Accelerated profile HMM searches, PLoS Comput. Biol, 7, e1002195, 10.1371/journal.pcbi.1002195

Federhen, 2015, Type material in the NCBI Taxonomy Database, Nucleic Acids Res, 43, D1086, 10.1093/nar/gku1127

Godfray, 2002, Challenges for taxonomy, Nature, 417, 17, 10.1038/417017a

Hyatt, 2010, Prodigal: prokaryotic gene recognition and translation initiation site identification, BMC Bioinformatics, 11, 119, 10.1186/1471-2105-11-119

Jain, 2017, High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries, Nat. Commun, 9, 5114, 10.1038/s41467-018-07641-9

Kitts, 2016, Assembly: a resource for assembled genomes at NCBI, Nucleic Acids Res, 44, D73, 10.1093/nar/gkv1226

Konstantinidis, 2005, Genomic insights that advance the species definition for prokaryotes, Proc. Natl. Acad. Sci. USA, 102, 2567, 10.1073/pnas.0409727102

Matsen, 2010, pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree, BMC Bioinformatics, 11, 538, 10.1186/1471-2105-11-538

Nabhan, 2012, The impact of taxon sampling on phylogenetic inference: a review of two decades of controversy, Brief. Bioinform, 13, 122, 10.1093/bib/bbr014

Parks, 2017, Recovery of nearly 8, 000 metagenome-assembled genomes substantially expands the tree of life, Nat. Microbiol, 2, 1533, 10.1038/s41564-017-0012-7

Parks, 2018, A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life, Nat. Biotechnol, 36, 996, 10.1038/nbt.4229

Parks, 2019, Selection of representative genomes for 24, 706 bacterial and archaeal species clusters provide a complete genome-based taxonomy, bioRxiv, 771964

Pasolli, 2019, Extensive unexplored human microbiome diversity revealed by over 150, 000 genomes from metagenomes spanning age, geography, and lifestyle, Cell, 176, 649, 10.1016/j.cell.2019.01.001

Rodriguez-R, 2018, The Microbial Genomes Atlas (MiGA) webserver: taxonomic and gene diversity analysis of archaea and bacteria at the whole genome level, Nucleic Acids Res, 46, W282, 10.1093/nar/gky467

Segata, 2013, PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes, Nat. Commun, 4, 2304, 10.1038/ncomms3304