Exact sequence variants should replace operational taxonomic units in marker-gene data analysis
Tóm tắt
Recent advances have made it possible to analyze high-throughput marker-gene sequencing data without resorting to the customary construction of molecular operational taxonomic units (OTUs): clusters of sequencing reads that differ by less than a fixed dissimilarity threshold. New methods control errors sufficiently such that amplicon sequence variants (ASVs) can be resolved exactly, down to the level of single-nucleotide differences over the sequenced gene region. The benefits of finer resolution are immediately apparent, and arguments for ASV methods have focused on their improved resolution. Less obvious, but we believe more important, are the broad benefits that derive from the status of ASVs as consistent labels with intrinsic biological meaning identified independently from a reference database. Here we discuss how these features grant ASVs the combined advantages of closed-reference OTUs—including computational costs that scale linearly with study size, simple merging between independently processed data sets, and forward prediction—and of de novo OTUs—including accurate measurement of diversity and applicability to communities lacking deep coverage in reference databases. We argue that the improvements in reusability, reproducibility and comprehensiveness are sufficiently great that ASVs should replace OTUs as the standard unit of marker-gene analysis and reporting.
Từ khóa
Tài liệu tham khảo
Amir, 2017, Deblur rapidly resolves single-nucleotide community sequence patterns, mSystems, 2, e00191, 10.1128/mSystems.00191-16
Baxter, 2016, Microbiota-based model improves the sensitivity of fecal immunochemical test for detecting colonic lesions, Genome Med, 8, 37, 10.1186/s13073-016-0290-3
Berry, 2017, Are oligotypes meaningful ecological and phylogenetic units? A case study of Microcystis in freshwater lakes, Front Microbiol, 8, 365, 10.3389/fmicb.2017.00365
Bokulich, 2013, Quality-filtering vastly improves diversity estimates from Illumina amplicon sequencing, Nat Methods, 10, 57, 10.1038/nmeth.2276
Callahan, 2016, DADA2: high-resolution sample inference from Illumina amplicon data, Nat Methods, 13, 581, 10.1038/nmeth.3869
Callahan, 2016, Bioconductor workflow for microbiome data analysis: from raw reads to community analyses, F1000Res, 5, 1492, 10.12688/f1000research.8986.1
De Vargas, 2015, Eukaryotic plankton diversity in the sunlit ocean, Science, 348, 1261605, 10.1126/science.1261605
DiGiulio, 2015, Temporal and spatial variation of the human microbiota during pregnancy, Proc Natl Acad Sci USA, 112, 11060, 10.1073/pnas.1502875112
Edgar, 2015, Error filtering, pair assembly and error correction for next-generation sequencing reads, Bioinformatics, 31, 3476, 10.1093/bioinformatics/btv401
Edgar, 2016, UNOISE2: improved error-correction for Illumina 16 S and ITS amplicon sequencing, bioRxiv, 081257
Eren, 2013, Oligotyping: differentiating between closely related microbial taxa using 16 S rRNA gene data, Methods Ecol Evol, 4, 1111, 10.1111/2041-210X.12114
Eren, 2015, Minimum entropy decomposition: unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences, ISME J, 9, 968, 10.1038/ismej.2014.195
Eren, 2016, Editorial: New insights into microbial ecology through subtle nucleotide variation, Front Microbiol, 7, 1318, 10.3389/fmicb.2016.01318
Kopylova, 2016, Open-source sequence clustering methods improve the state of the art, mSystems, 1, e00003, 10.1128/mSystems.00003-15
Kozich, 2013, Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencing platform, Appl Environ Microbiol, 79, 5112, 10.1128/AEM.01043-13
Mahé, 2015, Swarm v2: highly-scalable and high-resolution amplicon clustering, PeerJ, 3, e1420, 10.7717/peerj.1420
Needham, 2017, Ecological dynamics and co-occurrence among marine phytoplankton, bacteria and myoviruses shows microdiversity matters, ISME J, 11, 1614, 10.1038/ismej.2017.29
Rideout, 2014, Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences, PeerJ, 2, e545, 10.7717/peerj.545
Stackebrandt, 2006, Taxonomic parameters revisited: tarnished gold standards, Microbiology Today, 33, 152
Tikhonov, 2015, Interpreting 16 S metagenomic data without clustering to achieve sub-OTU resolution, ISME J, 9, 68, 10.1038/ismej.2014.117
Quince, 2011, Removing noise from pyrosequenced amplicons, BMC Bioinformatics, 12, 38, 10.1186/1471-2105-12-38