Exact sequence variants should replace operational taxonomic units in marker-gene data analysis

ISME Journal - Tập 11 Số 12 - Trang 2639-2643 - 2017
Benjamin J. Callahan1, Paul J. McMurdie2, Susan Holmes3
1Department of Population Health and Pathobiology, NC State University, Raleigh, NC, USA
2Whole Biome Inc, San Francisco, CA, USA
3Department of Statistics, Stanford University, Stanford, CA, USA.

Tóm tắt

Abstract

Recent advances have made it possible to analyze high-throughput marker-gene sequencing data without resorting to the customary construction of molecular operational taxonomic units (OTUs): clusters of sequencing reads that differ by less than a fixed dissimilarity threshold. New methods control errors sufficiently such that amplicon sequence variants (ASVs) can be resolved exactly, down to the level of single-nucleotide differences over the sequenced gene region. The benefits of finer resolution are immediately apparent, and arguments for ASV methods have focused on their improved resolution. Less obvious, but we believe more important, are the broad benefits that derive from the status of ASVs as consistent labels with intrinsic biological meaning identified independently from a reference database. Here we discuss how these features grant ASVs the combined advantages of closed-reference OTUs—including computational costs that scale linearly with study size, simple merging between independently processed data sets, and forward prediction—and of de novo OTUs—including accurate measurement of diversity and applicability to communities lacking deep coverage in reference databases. We argue that the improvements in reusability, reproducibility and comprehensiveness are sufficiently great that ASVs should replace OTUs as the standard unit of marker-gene analysis and reporting.

Từ khóa


Tài liệu tham khảo

Amir, 2017, Deblur rapidly resolves single-nucleotide community sequence patterns, mSystems, 2, e00191, 10.1128/mSystems.00191-16

Baxter, 2016, Microbiota-based model improves the sensitivity of fecal immunochemical test for detecting colonic lesions, Genome Med, 8, 37, 10.1186/s13073-016-0290-3

Berry, 2017, Are oligotypes meaningful ecological and phylogenetic units? A case study of Microcystis in freshwater lakes, Front Microbiol, 8, 365, 10.3389/fmicb.2017.00365

Bokulich, 2013, Quality-filtering vastly improves diversity estimates from Illumina amplicon sequencing, Nat Methods, 10, 57, 10.1038/nmeth.2276

Callahan, 2016, DADA2: high-resolution sample inference from Illumina amplicon data, Nat Methods, 13, 581, 10.1038/nmeth.3869

Callahan, 2016, Bioconductor workflow for microbiome data analysis: from raw reads to community analyses, F1000Res, 5, 1492, 10.12688/f1000research.8986.1

De Vargas, 2015, Eukaryotic plankton diversity in the sunlit ocean, Science, 348, 1261605, 10.1126/science.1261605

DiGiulio, 2015, Temporal and spatial variation of the human microbiota during pregnancy, Proc Natl Acad Sci USA, 112, 11060, 10.1073/pnas.1502875112

Edgar, 2015, Error filtering, pair assembly and error correction for next-generation sequencing reads, Bioinformatics, 31, 3476, 10.1093/bioinformatics/btv401

Edgar, 2016, UNOISE2: improved error-correction for Illumina 16 S and ITS amplicon sequencing, bioRxiv, 081257

Eren, 2013, Oligotyping: differentiating between closely related microbial taxa using 16 S rRNA gene data, Methods Ecol Evol, 4, 1111, 10.1111/2041-210X.12114

Eren, 2015, Minimum entropy decomposition: unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences, ISME J, 9, 968, 10.1038/ismej.2014.195

Eren, 2016, Editorial: New insights into microbial ecology through subtle nucleotide variation, Front Microbiol, 7, 1318, 10.3389/fmicb.2016.01318

Kopylova, 2016, Open-source sequence clustering methods improve the state of the art, mSystems, 1, e00003, 10.1128/mSystems.00003-15

Kozich, 2013, Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencing platform, Appl Environ Microbiol, 79, 5112, 10.1128/AEM.01043-13

Mahé, 2015, Swarm v2: highly-scalable and high-resolution amplicon clustering, PeerJ, 3, e1420, 10.7717/peerj.1420

Needham, 2017, Ecological dynamics and co-occurrence among marine phytoplankton, bacteria and myoviruses shows microdiversity matters, ISME J, 11, 1614, 10.1038/ismej.2017.29

Rideout, 2014, Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences, PeerJ, 2, e545, 10.7717/peerj.545

Stackebrandt, 2006, Taxonomic parameters revisited: tarnished gold standards, Microbiology Today, 33, 152

Tikhonov, 2015, Interpreting 16 S metagenomic data without clustering to achieve sub-OTU resolution, ISME J, 9, 68, 10.1038/ismej.2014.117

Quince, 2011, Removing noise from pyrosequenced amplicons, BMC Bioinformatics, 12, 38, 10.1186/1471-2105-12-38

Westcott, 2015, De novo clustering methods outperform reference-based methods for assigning 16 S rRNA gene sequences to operational taxonomic units, PeerJ, 3, e1487, 10.7717/peerj.1487