Swarm: robust and fast clustering method for amplicon-based studies

PeerJ - Tập 2 - Trang e593
Frédéric Mahé1,2,3, Torbjørn Rognes4,5, Christopher Quince6, Colomban de Vargas1,3, Micah Dunthorn2
1CNRS, UMR 7144, EPEP – Évolution des Protistes et des Écosystèmes Pélagiques, Station Biologique de Roscoff, Roscoff, France
2Department of Ecology, University of Kaiserslautern, Kaiserslautern, Germany
3Sorbonne Universités, UPMC Univ Paris 06, UMR 7144, Station Biologique de Roscoff, Roscoff, France
4Department of Informatics, University of Oslo, Oslo, Norway
5Department of Microbiology, Oslo University Hospital Rikshospitalet, Oslo, Norway
6School of Engineering, University of Glasgow, Glasgow, UK

Tóm tắt

Từ khóa


Tài liệu tham khảo

Bittner, 2013, Diversity patterns of uncultured Haptophytes unravelled by pyrosequencing in Naples Bay, Molecular Ecology, 22, 87, 10.1111/mec.12108

Caporaso, 2010, QIIME allows analysis of high-throughput community sequencing data, Nature Methods, 7, 335, 10.1038/nmeth.f.303

Caporaso, 2011, Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample, Proceedings of the National Academy of Sciences of the United States of America, 108, 4516, 10.1073/pnas.1000080107

Dunthorn, 2014, Placing environmental next-generation sequencing amplicons from microbial eukaryotes into a phylogenetic context, Molecular Biology and Evolution, 31, 993, 10.1093/molbev/msu055

Edgar, 2010, Search and clustering orders of magnitude faster than BLAST, Bioinformatics, 26, 2460, 10.1093/bioinformatics/btq461

Fu, 2012, CD-HIT: accelerated for clustering the next-generation sequencing data, Bioinformatics, 28, 3150, 10.1093/bioinformatics/bts565

Ghodsi, 2011, DNACLUST: accurate and efficient clustering of phylogenetic marker genes, BMC Bioinformatics, 12, 271, 10.1186/1471-2105-12-271

Gotoh, 1982, An improved algorithm for matching biological sequences, Journal of Molecular Biology, 162, 705, 10.1016/0022-2836(82)90398-9

Hubert, 1985, Comparing partitions, Journal of Classification, 2, 193, 10.1007/BF01908075

Huse, 2010, Ironing out the wrinkles in the rare biosphere through improved OTU clustering, Environmental Microbiology, 12, 1889, 10.1111/j.1462-2920.2010.02193.x

Karsenti, 2011, A holistic approach to marine eco-systems biology, PLoS Biology, 9, e1001177, 10.1371/journal.pbio.1001177

Koeppel, 2013, Surprisingly extensive mixed phylogenetic and ecological signals among bacterial Operational Taxonomic Units, Nucleic Acids Research, 41, 5175, 10.1093/nar/gkt241

Logares, 2014, Patterns of rare and abundant marine microbial eukaryotes, Current Biology, 24, 813, 10.1016/j.cub.2014.02.050

Masella, 2012, PANDAseq: paired-end assembler for illumina sequences, BMC Bioinformatics, 13, 31, 10.1186/1471-2105-13-31

Nebel, 2011, Delimiting operational taxonomic units for assessing ciliate environmental diversity using small-subunit rRNA gene sequences, Environmental Microbiology Reports, 3, 154, 10.1111/j.1758-2229.2010.00200.x

Needleman, 1970, A general method applicable to the search for similarities in the amino acid sequence of two proteins, Journal of Molecular Biology, 48, 443, 10.1016/0022-2836(70)90057-4

Rand, 1971, Objective criteria for the evaluation of clustering methods, Journal of the American Statistical Association, 66, 846, 10.1080/01621459.1971.10482356

R Development Core Team, 2014, R: a language and environment for statistical computing

Rognes, 2011, Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation, BMC Bioinformatics, 12, 221, 10.1186/1471-2105-12-221

Schloss, 2009, Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities, Applied and Environmental Microbiology, 75, 7537, 10.1128/AEM.01541-09

Sellers, 1974, On the theory and computation of evolutionary distances, SIAM Journal on Applied Mathematics, 26, 787, 10.1137/0126070

Smith, 1981, Identification of common molecular subsequences, Journal of Molecular Biology, 147, 195, 10.1016/0022-2836(81)90087-5

Sogin, 2006, Microbial diversity in the deep sea and the underexplored “rare biosphere”, Proceedings of the National Academy of Sciences of the United States of America, 103, 12115, 10.1073/pnas.0605127103

Stackebrandt, 1994, Taxonomic note: a place for DNA-DNA reassociation and 16S rRNA sequence analysis in the present species definition in bacteriology, International Journal of Systematic Bacteriology, 44, 846, 10.1099/00207713-44-4-846

Ukkonen, 1992, Approximate string-matching with q-grams and maximal matches, Theoretical Computer Science, 92, 191, 10.1016/0304-3975(92)90143-4

Wickham, 2009, ggplot2: elegant graphics for data analysis, 10.1007/978-0-387-98141-3