CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes

Genome Research - Tập 25 Số 7 - Trang 1043-1055 - 2015
Donovan H. Parks1, Michael Imelfort1, Connor T. Skennerton1, Philip Hugenholtz1,2, Gene W. Tyson1,3
11Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia, QLD 4072, Queensland, Australia
22Institute for Molecular Bioscience, The University of Queensland, St. Lucia, QLD 4072, Queensland, Australia
33Advanced Water Management Centre, The University of Queensland, St. Lucia, QLD 4072, Queensland, Australia

Tóm tắt

Large-scale recovery of genomes from isolates, single cells, and metagenomic data has been made possible by advances in computational methods and substantial reductions in sequencing costs. Although this increasing breadth of draft genomes is providing key information regarding the evolutionary and functional diversity of microbial life, it has become impractical to finish all available reference genomes. Making robust biological inferences from draft genomes requires accurate estimates of their completeness and contamination. Current methods for assessing genome quality are ad hoc and generally make use of a limited number of “marker” genes conserved across all bacterial or archaeal genomes. Here we introduce CheckM, an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes. We demonstrate the effectiveness of CheckM using synthetic data and a wide range of isolate-, single-cell-, and metagenome-derived genomes. CheckM is shown to provide accurate estimates of genome completeness and contamination and to outperform existing approaches. Using CheckM, we identify a diverse range of errors currently impacting publicly available isolate genomes and demonstrate that genomes obtained from single cells and metagenomic data vary substantially in quality. In order to facilitate the use of draft genomes, we propose an objective measure of genome quality that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities.

Từ khóa


Tài liệu tham khảo

10.1093/nar/gks406

10.1038/nbt.2579

10.1038/nmeth.1358

10.1126/science.1180614

10.7717/peerj.243

2010, Community-wide analysis of microbial genome sequences signatures, Genome Biol, 10:, R85

2014, Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods, Bioinformatics, 31:, 817

10.1038/ismej.2011.189

10.1038/nbt1360

10.1093/nar/gkt1223

2013, QUAST: quality assessment tool for genome assemblies, Bioinformatics, 15:, 1072

10.1093/nar/gkg128

10.1038/nature12375

10.1093/bioinformatics/bts429

10.7717/peerj.603

10.1371/journal.pone.0087924

10.1128/JB.187.18.6258-6264.2005

10.1038/nrmicro2350

10.1101/gr.032102

10.1093/nar/gkt963

10.1186/1471-2105-11-538

10.1038/ismej.2011.139

10.1038/nmeth.2575

10.1111/j.1462-2920.2009.02083.x

10.1186/1471-2105-12-328

10.1038/nmeth0311-191

10.1093/molbev/msp077

10.1038/nature12352

10.1101/gr.131383.111

10.1038/ncomms3304

10.7717/peerj.740

10.1126/science.1247023

10.1101/gr.142315.112

10.1073/pnas.1217107110

10.1128/JB.05667-11

10.1093/gbe/evu073

10.1093/bioinformatics/btq228

10.1073/pnas.1304246110

10.1038/nature06244

10.1038/nature02340

10.1093/oxfordjournals.molbev.a003851

10.1126/science.1224041

10.1093/bioinformatics/bts079

10.1038/nature08656

Wu D , Doroud L , Eisen JA . 2013. TreeOTU: operational taxonomic unit classification based on phylogenetic trees. arXiv 1308.6333v1.

10.1007/BF00160154

2013, A genomic update on clostridial phylogeny: gram-negative spore formers and other misplaced clostridia, Environ Microbiol, 15:, 2631, 10.1111/1462-2920.12173