phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data

PLoS ONE - Tập 8 Số 4 - Trang e61217
Paul J. McMurdie1, Susan Holmes1
1Department of Statistics, Stanford University, Stanford, California, United States of America

Tóm tắt

Từ khóa


Tài liệu tham khảo

ML Metzker, 2010, Sequencing technologies - the next generation, Nature Reviews Genetics, 11, 31, 10.1038/nrg2626

M Hamady, 2008, Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex, Nature Methods, 5, 235, 10.1038/nmeth.1184

NR Pace, 1997, A molecular view of microbial diversity and the biosphere, Science, 276, 734, 10.1126/science.276.5313.734

Z Liu, 2008, Accurate taxonomy assignments from 16S rRNA sequences produced by highly parallel pyrosequencers, Nucleic Acids Research, 36, e120, 10.1093/nar/gkn491

TZ DeSantis, 2006, NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes, Nucleic Acids Research, 34, W394, 10.1093/nar/gkl244

TZ DeSantis, 2006, Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB, Applied and Environ-mental Microbiology, 72, 5069, 10.1128/AEM.03006-05

JR Cole, 2009, The Ribosomal Database Project: improved alignments and new tools for rRNA analysis, Nucleic Acids Research, 37, D141, 10.1093/nar/gkn879

E Pruesse, 2007, SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB, Nucleic Acids Research, 35, 7188, 10.1093/nar/gkm864

W Li, 2006, CD-HIT: a fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics, 22, 1658, 10.1093/bioinformatics/btl158

Y Huang, 2010, CD-HIT Suite: a web server for clustering and comparing biological sequences, Bioinformatics, 26, 680, 10.1093/bioinformatics/btq003

J Caporaso, 2010, QIIME allows analysis of high-throughput community sequencing data, Nature methods, 7, 335, 10.1038/nmeth.f.303

PD Schloss, 2009, Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities, Applied and Environmental Microbiology, 75, 7537, 10.1128/AEM.01541-09

A Giongo, 2010, PANGEA: pipeline for analysis of next generation amplicons, The ISME Journal, 4, 852, 10.1038/ismej.2010.16

V Kunin, 2010, PyroTagger: A fast, accurate pipeline for analysis of rRNA amplicon pyrosequence data, The Open Journal

SV Angiuoli, 2011, CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing, BMC Bioinformatics, 12, 356, 10.1186/1471-2105-12-356

2011, The Genboree Microbiome Toolset and the Analysis of 16S rRNA Microbial Sequences. biotconf.org

QIIME EC2 image documentation. Available: <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://qiime.org/svn_documentation/tutorials/working_with_aws.html" xlink:type="simple">http://qiime.org/svn_documentation/tutorials/working_with_aws.html</ext-link>. Accessed 2013 March 22.

University of Colorado Boulder Knight Lab. n3phele bioinformatics in the cloud. Available: <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://www.n3phele.com/" xlink:type="simple">http://www.n3phele.com/</ext-link>. Accessed 2013 March 22.

F Meyer, 2008, The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes, BMC Bioinformatics, 9, 386, 10.1186/1471-2105-9-386

JC Venter, 1998, Shotgun sequencing of the human genome, Science, 280, 1540, 10.1126/science.280.5369.1540

R Fleischmann, 1995, Whole-genome random sequencing and assembly of Haemophilus inuenzae Rd, Science, 269, 496, 10.1126/science.7542800

JC Venter, 2004, Environmental genome shotgun sequencing of the sargasso sea, Science, 304, 66, 10.1126/science.1093857

TJ Sharpton, 2011, PhylOTU: a high-throughput procedure quantifies microbial community diversity and resolves novel taxa from metagenomic data, PLoS computational biology, 7, e1001061, 10.1371/journal.pcbi.1001061

R Development Core Team (2011) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.

Stroustrup B (2000) The C++ programming language. ISBN 0201700735. Addison-Wesley Pro-fessional, 3rd edition.

Chambers J (2008) Software for data analysis: programming with R. Springer Verlag.

Simpson GL. CRAN Task View: Analysis of Ecological and Environmental Data. Available: <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://cran.r-project.org/web/views/Environmetrics.html" xlink:type="simple">http://cran.r-project.org/web/views/Environmetrics.html</ext-link>. Accessed 2013 March 22.

J Chakerian, 2010, distory: Distances between trees

KP Schliep, 2011, phangorn: phylogenetic analysis in R, Bioinformatics, 27, 592, 10.1093/bioinformatics/btq706

SW Kembel, 2010, Picante: R tools for integrating phylogenies and ecology, Bioinformatics, 26, 1463, 10.1093/bioinformatics/btq166

PJ McMurdie, 2012, phyloseq: A Bioconductor Package for Handling and Analysis of High-Throughput Phylogenetic Sequence Data, Pacific Symposium on Biocomputing, 17, 235

Hardle W, Ronz B, editors (2002) Sweave. Dynamic generation of statistical reports using literate data analysis. Compstat 2002, Proceedings in Computational Statistics.

Y Xie, 2012, knitr: A general-purpose package for dynamic report generation in R, R package version 0.8

RC Gentleman, 2004, Bioconductor: open software development for computational biology and bioinformatics, Genome Biology, 5, R80, 10.1186/gb-2004-5-10-r80

D Beck, 2011, OTUbase: an R infrastructure package for operational taxo-nomic unit data, Bioinformatics

OTUbase Bioconductor Release Page. (2012) Available: <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://www.bioconductor.org/packages/release/bioc/html/OTUbase.html" xlink:type="simple">http://www.bioconductor.org/packages/release/bioc/html/OTUbase.html</ext-link>. Accessed 2013 March 22.

D McDonald, 2012, The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome, Giga Science

McMurdie PJ, Holmes S. Package manual for phyloseq. Available: <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://bioconductor.org/packages/devel/bioc/manuals/phyloseq/man/phyloseq.pdf" xlink:type="simple">http://bioconductor.org/packages/devel/bioc/manuals/phyloseq/man/phyloseq.pdf</ext-link>. Accessed 2013 March 22.

The phyloseq Homepage. Available: joey711.github.com/phyloseq/. Accessed 2013 March 22.

2012, Writing R Extensions, Comprehensive R Archive Network

Wickham H, Danenberg P, Eugster M. roxygen2: In-source documentation for R. R package version 2.2.2. Available: <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://cran.r-project.org/web/packages/roxygen2/index.html" xlink:type="simple">http://cran.r-project.org/web/packages/roxygen2/index.html</ext-link>. Accessed 2013 March 22.

D Faith, 1987, Compositional dissimilarity as a robust measure of ecological distance, Vegetatio, 69, 57, 10.1007/BF00038687

MJ Anderson, 2006, Multivariate dispersion as a measure of beta diversity, Ecology Letters, 9, 683, 10.1111/j.1461-0248.2006.00926.x

M Hamady, 2009, Fast unifrac: facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and phylochip data, The ISME Journal

CA Lozupone, 2007, Quantitative and qualitative beta diversity measures lead to different insights into factors that structure microbial communities, Applied and Environmental Microbiology, 73, 1576, 10.1128/AEM.01996-06

C Lozupone, 2005, UniFrac: a new phylogenetic method for comparing microbial communities, Applied and Environmental Microbiology, 71, 8228, 10.1128/AEM.71.12.8228-8235.2005

JG Caporaso, 2011, Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample, Proceedings of the National Academy of Sciences, 108, 4516, 10.1073/pnas.1000080107

Greenacre MJ (1984) Theory and Applications of Correspondence Analysis. London: Academic Press.

CJF Ter Braak, 1986, Canonical Correspondence Analysis: A new eigenvector technique for multivariate direct gradient analysis, Ecology, 67, 1167, 10.2307/1938672

M Hill, 1980, Detrended Correspondence Analysis, an improved ordination technique, Vegetatio, 42, 47, 10.1007/BF00048870

AL Wollenberg, 1977, Redundancy analysis an alternative for canonical correlation analysis, Psychometrika, 42, 207, 10.1007/BF02294050

H Hotelling, 1933, Analysis of a complex of statistical variables into principal components, Journal of Educational Psychology, 24, 417, 10.1037/h0071325

S Pavoine, 2004, From dissimilarities among species to dissimilarities among communities: a double principal coordinate analysis, Journal of Theoretical Biology, 228, 523, 10.1016/j.jtbi.2004.02.014

JC Gower, 1966, Some distance properties of latent root and vector methods used in multivariate analysis, Biometrika, 53, 325, 10.1093/biomet/53.3-4.325

PR Minchin, 1987, An evaluation of the relative robustness of techniques for ecological ordination, Vegetatio, 69, 89, 10.1007/BF00038690

J Thioulouse, 2011, Simultaneous analysis of a sequence of paired ecological tables: A comparison of several methods, Annals of Applied Statistics, 5, 2300, 10.1214/10-AOAS372

Wickham H (2009) ggplot2: elegant graphics for data analysis. Springer New York.

Wilkinson L, Wills G (2005) The Grammar Of Graphics. Statistics and Computing. Springer, 2nd edition.

S Rajaram, 2010, NeatMap–non-clustering heat map alternatives in R, BMC Bioinformatics, 11, 45, 10.1186/1471-2105-11-45

G Csardi, 2006, The igraph software package for complex network research, InterJournal Complex Systems, 1695

Tufte ER (2001) The visual display of quantitative information, Graphics Press, Cheshire, Con-necticut, chapter 9 Aesthetics and Technique in Data Graphical Design. 2nd edition, p. 178.

Greenacre M (2007) Correspondence analysis in practice. Chapman &amp; Hall.

AJ Pinto, 2012, PCR Biases Distort Bacterial and Archaeal Community Structure in Pyrosequencing Datasets, PLoS ONE, 7, e43093, 10.1371/journal.pone.0043093

HL Sanders, 1968, Marine benthic diversity: A comparative study, The American Naturalist, 102, 243, 10.1086/282541

S Holmes, 2011, Visualization and statisti-cal comparisons of microbial communities using R packages on phylochip data, Pacific Symposium on Biocomputing, 142

DB Allison, 2006, Microarray Data Analysis: from Disarray to Consolidation and Consensus, Nat Rev Genet, 7, 55, 10.1038/nrg1749

S Holmes, 2012, Statistical analysis challenges in the microbiome, To appear PNAS: The Social Biology of Microbial Communities forum on Microbial Threats

T Nelson, 2010, Shifts in luminal and mucosal microbial communities associated with an experimental model of irritable bowel syndrome, Gastroenterology

Efron B, Tibshirani R (1993) An introduction to the bootstrap, volume 57. Chapman &amp; Hall/CRC.

S Holmes, 2003, Bootstrapping phylogenetic trees: theory and methods, Statistical Science, 241, 10.1214/ss/1063994979

PH Westfall, 1993, Resampling-Based Multiple Testing. Examples and Methods for P-Value Adjustment, Wiley-Interscience

KS Pollard, 2010, multtest: Resampling-based multiple hypothesis testing, R package version 2.4.0

JPA Ioannidis, 2005, Why most published research findings are false, PLoS medicine, 2, e124, 10.1371/journal.pmed.0020124

Z Merali, 2010, Computational science: Error, why scientific programming does not compute, Nature, 467, 775

RD Peng, 2011, Reproducible research in computational science, Science, 334, 1226, 10.1126/science.1213847

DC Ince, 2012, The case for open computer programs, Nature, 482, 485, 10.1038/nature10836

Carey VJ, Stodden V (2010) Reproducible Research Concepts and Tools for Cancer Bioinformatics. In: Ochs MF, Casagrande JT, Davuluri RV, editors, Biomedical Informatics for Cancer Research, Boston, MA: Springer US. pp. 149–175.

R Knight, 2012, Unlocking the potential of metage-nomics through replicated experimental design, Nature biotechnology, 30, 513, 10.1038/nbt.2235

2012, Structure, function and diversity of the healthy human microbiome, Nature, 486, 207, 10.1038/nature11234

DL Donoho, 2010, An invitation to reproducible computational research, Biostatistics (Oxford, England), 11, 385, 10.1093/biostatistics/kxq028

RD Peng, 2009, Reproducible research and Biostatistics, Biostatistics (Oxford, England), 10, 405, 10.1093/biostatistics/kxp014

R Gentleman, 2004, Statistical analyses and reproducible research, Bioconductor Project Working Papers, 2

F Pérez, 2007, IPython: a System for Interactive Scientific Computing, Comput Sci Eng, 9, 21, 10.1109/MCSE.2007.53

Allaire J, Horner J, Marti V, Porte N The markdown package: Markdown rendering for R. R package version 0.5.4. Available: <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://CRAN.R-project.org/package=markdown" xlink:type="simple">http://CRAN.R-project.org/package=markdown</ext-link>. Accessed 2013 March 22.

R Gentleman, 2005, Reproducible research: a bioinformatics case study, Statistical applications in genetics and molecular biology, 4, Article2, 10.2202/1544-6115.1034

The phyloseq Demo Repository. Available: <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/joey711/phyloseq-demo" xlink:type="simple">https://github.com/joey711/phyloseq-demo</ext-link>. Accessed 2013 March 22.

N Barnes, 2010, Publish your computer code: it is good enough, Nature, 467, 753, 10.1038/467753a

WK Copeland, 2012, mcaGUI: microbial commu-nity analysis R-Graphical User Interface (GUI), Bioinformatics (Oxford, England), 28, 2198, 10.1093/bioinformatics/bts338

H Wickham, 2007, Reshaping data with the reshape package, Journal of Statistical Software, 21, 1, 10.18637/jss.v021.i12

H Wickham, 2011, The split-apply-combine strategy for data analysis, Journal of Statistical Software, 40, 1, 10.18637/jss.v040.i01

M Arumugam, 2011, Enterotypes of the human gut microbiome, Nature, 473, 174, 10.1038/nature09944

J Oksanen, 2011, vegan: Community Ecology Package, R package version 1.17–10