MSClust: A Multi-Seeds based Clustering algorithm for microbiome profiling using 16S rRNA sequence

Journal of Microbiological Methods - Tập 94 - Trang 347-355 - 2013
Wei Chen1,2, Yongmei Cheng1, Clarence Zhang3, Shaowu Zhang1, Hongyu Zhao2
1College of Automation, Northwestern Polytechnical University, 710072, Xi’an, China
2Department of Biostatistics, Yale School of Public Health, New Haven, CT 06510, United States
3Keck Biotechnology Laboratory, Biostatistics Resource, Yale School of Medicine, New Haven, CT 06510, United States

Tài liệu tham khảo

Sharpton, 2011, PhylOTU: a high-throughput procedure quantifies microbial community diversity and resolves novel taxa from metagenomic data, PLoS Comput. Biol., 7, e1001061, 10.1371/journal.pcbi.1001061 Schloss, 2009, Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities, Appl. Environ. Microbiol., 75, 7537, 10.1128/AEM.01541-09 Schloss, 2005, Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness, Appl. Environ. Microbiol., 71, 1501, 10.1128/AEM.71.3.1501-1506.2005 Huse, 2010, Ironing out the wrinkles in the rare biosphere through improved OTU clustering, Environ. Microbiol., 12, 1889, 10.1111/j.1462-2920.2010.02193.x Sun, 2009, ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences, Nucleic Acids Res., 37, e76, 10.1093/nar/gkp285 Li, 2006, Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics, 22, 1658, 10.1093/bioinformatics/btl158 Edgar, 2010, Search and clustering orders of magnitude faster than BLAST, Bioinformatics, 26, 2460, 10.1093/bioinformatics/btq461 Russell, 2010, A grammar-based distance metric enables fast and accurate clustering of large sets of 16S sequences, BMC Bioinforma., 11, 601, 10.1186/1471-2105-11-601 Ghodsi, 2011, DNACLUST: accurate and efficient clustering of phylogenetic marker genes, BMC Bioinforma., 12, 271, 10.1186/1471-2105-12-271 Hao, 2011, Clustering 16S rRNA for OTU prediction: a method of unsupervised Bayesian clustering, Bioinformatics, 27, 611, 10.1093/bioinformatics/btq725 Barriuso, 2011, Estimation of bacterial diversity using next generation sequencing of 16S rDNA: a comparison of different workflows, BMC Bioinforma., 12 Peng, 2010, SPICi: a fast clustering algorithm for large biological networks, Bioinformatics, 26, 1105, 10.1093/bioinformatics/btq078 Cai, 2011, ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time, Nucleic Acids Res., 39, e95, 10.1093/nar/gkr349 Lysholm, 2011, An efficient simulator of 454 data using configurable statistical models, BMC Res. Notes, 4, 449, 10.1186/1756-0500-4-449 Huse, 2007, Accuracy and quality of massively parallel DNA pyrosequencing, Genome Biol., 8, R143, 10.1186/gb-2007-8-7-r143 Xuan Vinh, 2010, Information theoretic measurement for clustering comparison: variants, properties, normalization and correction chance, J. Mach. Learn. Res., 11, 2837 Sun, 2012, A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis, Brief. Bioinform., 13, 107, 10.1093/bib/bbr009 Turnbaugh, 2009, A core gut microbiome in obese and lean twins, Nature, 457, 480, 10.1038/nature07540 Cole, 2009, The ribosomal database project: improved alignments and new tools for rRNA analysis, Nucleic Acid Res., 37, D141, 10.1093/nar/gkn879 Lempel, 1976, On the complexity of finite sequences, IEEE Trans. Inf. Theory, 22, 75, 10.1109/TIT.1976.1055501 Schloss, 2011, Assessing and improving methods used in operational taxonomic unit-based approaches for 16S rRNA gene sequence analysis, Appl. Environ. Microbiol., 77, 3219, 10.1128/AEM.02810-10 Sogin, 2006, Microbial diversity in the deep sea and the underexplored “rare biosphere”, Proc. Natl. Acad. Sci. U. S. A., 103, 12115, 10.1073/pnas.0605127103