ESPRIT-Forest: Parallel clustering of massive amplicon sequence data in subquadratic time
Tóm tắt
Từ khóa
Tài liệu tham khảo
A Sboner, 2011, The real cost of sequencing: higher than you think!, Genome Biology, 12, 125, 10.1186/gb-2011-12-8-125
N Beerenwinkel, 2011, Ultra-deep sequencing for the analysis of viral populations, Current Opinion in Virology, 1, 413, 10.1016/j.coviro.2011.07.008
ML Sogin, 2006, Microbial diversity in the deep sea and the underexplored “rare biosphere”, Proceedings of the National Academy of Sciences, 103, 12115, 10.1073/pnas.0605127103
HE O’Brien, 2005, Fungal community analysis by large-scale sequencing of environmental samples, Applied and Environmental Microbiology, 71, 5544, 10.1128/AEM.71.9.5544-5550.2005
P López-García, 2001, Unexpected diversity of small eukaryotes in deep-sea Antarctic plankton, Nature, 409, 603, 10.1038/35054537
Z Kan, 2010, Diverse somatic mutation patterns and pathway alterations in human cancers, Nature, 466, 869, 10.1038/nature09208
SD Boyd, 2009, Measurement and clinical monitoring of human lymphocyte clonality by massively parallel VDJ pyrosequencing, Science Translational Medicine, 1, 12ra23
JM Di Bella, 2013, High throughput sequencing methods and analysis for microbiome research, Journal of Microbiological Methods, 95, 401, 10.1016/j.mimet.2013.08.011
SS Mande, 2012, Classification of metagenomic sequences: methods and challenges, Briefings in Bioinformatics, 13, 669, 10.1093/bib/bbs054
J Dröge, 2012, Taxonomic binning of metagenome samples generated by next-generation sequencing technologies, Briefings in Bioinformatics, 13, 646, 10.1093/bib/bbs031
W Li, 2006, Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics, 22, 1658, 10.1093/bioinformatics/btl158
RC Edgar, 2010, Search and clustering orders of magnitude faster than BLAST, Bioinformatics, 26, 2460, 10.1093/bioinformatics/btq461
Y Sun, 2011, A large-scale benchmark study of existing algorithms for taxonomy-independnet microbial community analysis, Briefings in Bioinformatics, 13, 107, 10.1093/bib/bbr009
W Chen, 2013, MSClust: A multi-seeds based clustering algorithm for microbiome profiling using 16S rRNA sequences, Journal of Microbiological Methods, 94, 347, 10.1016/j.mimet.2013.07.004
MJ Bonder, 2012, Comparing clustering and pre-processing in taxonomy analysis, Bioinformatics, 28, 2891, 10.1093/bioinformatics/bts552
J Peterson, 2009, The NIH Human Microbiome Project, Genome Research, 19, 2317, 10.1101/gr.096651.109
Y Cai, 2011, ESPRIT-Tree: Hierarchical clustering analysis of millions of 16S rRNA Pyrosequences in quasilinear computational time, Nuclear Acids Research, 39, e95, 10.1093/nar/gkr349
X Wang, 2012, Secondary structure information does not improve OTU assignment for partial 16S rRNA sequences, The ISME Journal, 6, 1277, 10.1038/ismej.2011.187
J Barriuso, 2011, Estimation of bacterial diversity using next generation sequencing of 16S rDNA: a comparison of different workflows, BMC Bioinformatics, 12, 473, 10.1186/1471-2105-12-473
CF Olson, 1995, Parallel algorithms for hierarchical clustering, Parallel Computing, 21, 1313, 10.1016/0167-8191(95)00017-I
M Dash, 2004, Euro-Par 2004 Parallel Processing, 363
Z Feng, 2007, A parallel hierarchical clustering algorithm for PCs cluster system, Neurocomputing, 70, 809, 10.1016/j.neucom.2006.10.034
JFM Rodrigues, 2014, HPC-CLUST: distributed hierarchical clustering for large sets of nucleotide sequences, Bioinformatics, 30, 287, 10.1093/bioinformatics/btt657
Y Sun, 2009, ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences, Nuclear Acids Research, 37, e76, 10.1093/nar/gkp285
TD Nguyen, 2015, Efficient and Accurate OTU Clustering with GPU-Based Sequence Alignment and Dynamic Dendrogram Cutting, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12, 1060, 10.1109/TCBB.2015.2407574
Mao Q, Zheng W, Wang L, Cai Y, Mai V, Sun Y. Parallel Hierarchical Clustering in Linearithmic Time for Large-Scale Sequence Analysis. In: 2015 IEEE International Conference on Data Mining; 2015. p. 310–319.
RC Edgar, 2004, MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Research, 32, 1792, 10.1093/nar/gkh340
K Katoh, 2002, MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform, Nucleic Acids Research, 30, 3059, 10.1093/nar/gkf436
MN Price, 2010, FastTree 2–approximately maximum-likelihood trees for large alignments, PLoS ONE, 5, e9490, 10.1371/journal.pone.0009490
K Howe, 2002, QuickTree: building huge Neighbour-Joining trees of protein sequences, Bioinformatics, 18, 1546, 10.1093/bioinformatics/18.11.1546
MJ Quinn, 2004, Parallel Programming in C with MPI and OpenMP
RC Edgar, 2011, UCHIME improves sensitivity and speed of chimera detection, Bioinformatics, 27, 2194, 10.1093/bioinformatics/btr381
RC Edgar, 2013, UPARSE: Highly accurate OTU sequences from microbial amplicon reads, Nature Methods, 10, 996, 10.1038/nmeth.2604
PJ Turnbaugh, 2008, A core gut microbiome in obese and lean twins, Nature, 457, 480, 10.1038/nature07540
J Ye, 2006, BLAST: improvements for better sequence analysis, Nucleic acids research, 34, W6, 10.1093/nar/gkl164
JR Cole, 2005, The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis, Nucleic acids research, 33, D294
A Giongo, 2010, TaxCollector: modifying current 16S rRNA databases for the rapid classification at six taxonomic levels, Diversity, 2, 1015, 10.3390/d2071015
MJ Claesson, 2011, Composition, variability, and temporal stability of the intestinal microbiota of the elderly, Proceedings of the National Academy of Sciences, 108, 4586, 10.1073/pnas.1000097107
2012, Structure, function and diversity of the healthy human microbiome, Nature, 486, 207, 10.1038/nature11234
T Ding, 2014, Dynamics and associations of microbial community types across the human body, Nature, 509, 357, 10.1038/nature13178
AF Koeppel, 2013, Surprisingly extensive mixed phylogenetic and ecological signals among bacterial Operational Taxonomic Units, Nucleic acids research, gkt241
SL Westcott, 2015, De novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units, PeerJ, 3, e1487, 10.7717/peerj.1487
A May, 2014, Unraveling the outcome of 16S rDNA-based taxonomy analysis through mock data and simulations, Bioinformatics, 30, 1530, 10.1093/bioinformatics/btu085
JM Flynn, 2015, Toward accurate molecular identification of species in complex environmental samples: testing the performance of sequence filtering and clustering methods, Ecology and evolution, 5, 2252, 10.1002/ece3.1497
JR White, 2010, Alignment and clustering of phylogenetic markers-implications for microbial diversity studies, BMC bioinformatics, 11, 1, 10.1186/1471-2105-11-152
X Wang, 2013, M-pick, a modularity-based method for OTU picking of 16S rRNA sequences, BMC bioinformatics, 14, 1, 10.1186/1471-2105-14-43
C Lozupone, 2005, UniFrac: a new phylogenetic method for comparing microbial communities, Applied and environmental microbiology, 71, 8228, 10.1128/AEM.71.12.8228-8235.2005
F Corpet, 1988, Multiple sequence alignment with hierarchical clustering, Nucleic acids research, 16, 10881, 10.1093/nar/16.22.10881