Toward accurate molecular identification of species in complex environmental samples: testing the performance of sequence filtering and clustering methods

Ecology and Evolution - Tập 5 Số 11 - Trang 2252-2266 - 2015
Jullien M. Flynn1, Emily Brown1,2, Frédéric J. J. Chain1, Hugh J. MacIsaac2, Melania E. Cristescu1
1Department of Biology, McGill University 1205 Docteur Penfield, Stewart Biology Building, Montreal, Quebec, Canada, H3A 1B1.
2Great Lakes Institute for Environmental Research, University of Windsor, Windsor, Ontario, Canada

Tóm tắt

Abstract

Metabarcoding has the potential to become a rapid, sensitive, and effective approach for identifying species in complex environmental samples. Accurate molecular identification of species depends on the ability to generate operational taxonomic units (OTUs) that correspond to biological species. Due to the sometimes enormous estimates of biodiversity using this method, there is a great need to test the efficacy of data analysis methods used to derive OTUs. Here, we evaluate the performance of various methods for clustering length variable 18S amplicons from complex samples into OTUs using a mock community and a natural community of zooplankton species. We compare analytic procedures consisting of a combination of (1) stringent and relaxed data filtering, (2) singleton sequences included and removed, (3) three commonly used clustering algorithms (mothur, UCLUST, and UPARSE), and (4) three methods of treating alignment gaps when calculating sequence divergence. Depending on the combination of methods used, the number of OTUs varied by nearly two orders of magnitude for the mock community (60–5068 OTUs) and three orders of magnitude for the natural community (22–22191 OTUs). The use of relaxed filtering and the inclusion of singletons greatly inflated OTU numbers without increasing the ability to recover species. Our results also suggest that the method used to treat gaps when calculating sequence divergence can have a great impact on the number of OTUs. Our findings are particularly relevant to studies that cover taxonomically diverse species and employ markers such as rRNA genes in which length variation is extensive.

Từ khóa


Tài liệu tham khảo

10.1016/S0022-2836(05)80360-2

10.1038/ismej.2012.106

10.1186/1471-2105-12-473

10.1111/j.1462-2920.2010.02332.x

10.1016/j.tree.2011.11.010

10.1093/bioinformatics/bts552

10.1093/nar/gkr349

10.1038/nmeth.f.303

10.1002/etc.2450

10.1016/j.mimet.2013.07.004

10.1371/journal.pone.0070837

10.1007/PL00006602

10.1093/nar/gkn879

10.1093/oxfordjournals.molbev.a025871

10.1093/bioinformatics/btq461

10.1038/nmeth.2604

10.1093/bioinformatics/btr381

10.1371/journal.pone.0074371

10.1080/00222930210144352

10.1186/1471-2105-11-38

10.1038/ncomms1095

10.1186/1471-2105-12-271

10.4137/EBO.S5504

10.1007/BF00160511

10.1093/bioinformatics/btq725

10.1098/rspb.2002.2218

10.1186/gb-2007-8-7-r143

10.1111/j.1462-2920.2010.02193.x

10.2108/zsj.17.111

10.1101/gr.084517.108

10.1371/journal.pone.0030230

10.1093/molbev/mst010

10.1111/j.1462-2920.2009.02051.x

10.1093/bioinformatics/btl158

10.7717/peerj.593

10.1186/1756-0500-4-149

10.1093/bioinformatics/btu085

10.1038/nmeth.f.268

10.1093/molbev/msi119

10.1186/1756-0500-3-3

10.1111/1755-0998.12261

10.1111/j.1365-294X.2011.05403.x

10.1093/nar/gks1219

10.1038/nmeth.1361

10.1186/1471-2105-12-38

10.1186/1471-2105-11-601

10.1371/journal.pcbi.1000844

10.1128/AEM.01541-09

10.1093/nar/gkp285

10.1093/bib/bbr009

10.1111/j.1469-8137.2010.03373.x

10.1186/1471-2105-14-43

10.1093/nar/28.23.4698

10.1007/s11427-012-4423-7

10.1111/2041-210X.12037