Development of a Dual-Index Sequencing Strategy and Curation Pipeline for Analyzing Amplicon Sequence Data on the MiSeq Illumina Sequencing Platform

Applied and Environmental Microbiology - Tập 79 Số 17 - Trang 5112-5120 - 2013
James J. Kozich1, Sarah L. Westcott1, Nielson T. Baxter1, Sarah K. Highlander2, Patrick D. Schloss1
1Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, USA
2Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, Texas, USA

Tóm tắt

ABSTRACT Rapid advances in sequencing technology have changed the experimental landscape of microbial ecology. In the last 10 years, the field has moved from sequencing hundreds of 16S rRNA gene fragments per study using clone libraries to the sequencing of millions of fragments per study using next-generation sequencing technologies from 454 and Illumina. As these technologies advance, it is critical to assess the strengths, weaknesses, and overall suitability of these platforms for the interrogation of microbial communities. Here, we present an improved method for sequencing variable regions within the 16S rRNA gene using Illumina's MiSeq platform, which is currently capable of producing paired 250-nucleotide reads. We evaluated three overlapping regions of the 16S rRNA gene that vary in length (i.e., V34, V4, and V45) by resequencing a mock community and natural samples from human feces, mouse feces, and soil. By titrating the concentration of 16S rRNA gene amplicons applied to the flow cell and using a quality score-based approach to correct discrepancies between reads used to construct contigs, we were able to reduce error rates by as much as two orders of magnitude. Finally, we reprocessed samples from a previous study to demonstrate that large numbers of samples could be multiplexed and sequenced in parallel with shotgun metagenomes. These analyses demonstrate that our approach can provide data that are at least as good as that generated by the 454 platform while providing considerably higher sequencing coverage for a fraction of the cost.

Từ khóa


Tài liệu tham khảo

10.1128/JB.180.2.366-376.1998

10.1038/nature11234

10.1073/pnas.0605127103

10.1371/journal.pone.0041606

10.1186/2049-2618-1-10

10.1371/journal.pone.0015406

10.1111/j.1462-2920.2010.02193.x

10.1111/j.1462-2920.2009.02051.x

10.1186/1471-2105-12-38

10.1371/journal.pone.0027310

10.1128/AEM.00062-07

10.1038/nature07517

10.1038/ismej.2012.8

10.1073/pnas.1000080107

10.1038/ismej.2011.186

10.1038/nmeth.2276

10.1186/1471-2105-13-31

10.4161/gmic.21008

10.1093/nar/gkm864

10.1101/gr.112730.110

10.1101/gr.089532.108

10.1128/AEM.01541-09

10.1371/journal.pcbi.1000844

10.1371/journal.pone.0008230

10.1038/ismej.2012.102

10.1093/bioinformatics/btr381

10.1128/AEM.02810-10

10.1080/STA-200066418