Rhea: a transparent and modular R pipeline for microbial profiling based on 16S rRNA gene amplicons

PeerJ - Tập 5 - Trang e2836
Ilias Lagkouvardos1, Sandra E. Fischer1, Neeraj Kumar1, Thomas Clavel1
1ZIEL—Core Facility Microbiome/NGS, Technical University of Munich, Freising, Germany

Tóm tắt

The importance of 16S rRNA gene amplicon profiles for understanding the influence of microbes in a variety of environments coupled with the steep reduction in sequencing costs led to a surge of microbial sequencing projects. The expanding crowd of scientists and clinicians wanting to make use of sequencing datasets can choose among a range of multipurpose software platforms, the use of which can be intimidating for non-expert users. Among available pipeline options for high-throughput 16S rRNA gene analysis, the R programming language and software environment for statistical computing stands out for its power and increased flexibility, and the possibility to adhere to most recent best practices and to adjust to individual project needs. Here we present the Rhea pipeline, a set of R scripts that encode a series of well-documented choices for the downstream analysis of Operational Taxonomic Units (OTUs) tables, including normalization steps,alpha- andbeta-diversity analysis, taxonomic composition, statistical comparisons, and calculation of correlations. Rhea is primarily a straightforward starting point for beginners, but can also be a framework for advanced users who can modify and expand the tool. As the community standards evolve, Rhea will adapt to always represent the current state-of-the-art in microbial profiles analysis in the clear and comprehensive way allowed by the R language. Rhea scripts and documentation are freely available athttps://lagkouvardos.github.io/Rhea.

Từ khóa


Tài liệu tham khảo

Aitchison, 1986, The statistical analysis of compositional data, 10.1007/978-94-009-4109-0

Anderson, 2001, A new method for non-parametric multivariate analysis of variance, Austral Ecology, 26, 32, 10.1111/j.1442-9993.2001.01070.pp.x

Bálint, 2016, Millions of reads, thousands of taxa: microbial community structure and associations analyzed via marker genes, FEMS Microbiology Reviews, fuw017, 10.1093/femsre/fuw017

Benjamini, 1995, Controlling the false discovery rate: a practical and powerful approach to multiple testing, Journal of the Royal Statistical Society Series B (Methodological), 57, 289, 10.1111/j.2517-6161.1995.tb02031.x

Bourgon, 2010, Independent filtering increases detection power for high-throughput experiments, Proceedings of the National Academy of Sciences of the United States of America, 107, 9546, 10.1073/pnas.0914005107

Bray, 1957, An ordination of the upland forest communities of southern Wisconsin, Ecological Monographs, 27, 325, 10.2307/1942268

Caporaso, 2010, QIIME allows analysis of high-throughput community sequencing data, Nature Methods, 7, 335, 10.1038/nmeth.f.303

Chen, 2012, Associating microbiome composition with environmental covariates using generalized UniFrac distances, Bioinformatics, 28, 2106, 10.1093/bioinformatics/bts342

Clavel, 2016, Microbiome sequencing: challenges and opportunities for molecular medicine, Expert Review of Molecular Diagnostics, 16, 795, 10.1080/14737159.2016.1184574

Edgar, 2010, Search and clustering orders of magnitude faster than BLAST, Bioinformatics, 26, 2460, 10.1093/bioinformatics/btq461

Edgar, 2013, UPARSE: highly accurate OTU sequences from microbial amplicon reads, Nature Methods, 10, 996, 10.1038/nmeth.2604

Feise, 2002, Do multiple outcome measures require p-value adjustment?, BMC Medical Research Methodology, 2, 1, 10.1186/1471-2288-2-1

Fisher, 1950, Statistical methods for research workers

Glassing, 2016, Inherent bacterial DNA contamination of extraction and sequencing reagents may affect interpretation of microbiota in low bacterial biomass samples, Gut Pathogens, 8, 10.1186/s13099-015-0083-z

Hiergeist, 2016, Multicenter quality assessment of 16S ribosomal DNA-sequencing for microbiome analyses reveals high inter-center variability, International Journal of Medical Microbiology, 306, 334, 10.1016/j.ijmm.2016.03.005

Hildebrand, 2014, LotuS: an efficient and user-friendly OTU processing pipeline, Microbiome, 2, 10.1186/2049-2618-2-1

Hollander, 2013, Nonparametric statistical methods

Jost, 2006, Entropy and diversity, Oikos, 113, 363, 10.1111/j.2006.0030-1299.14714.x

Jost, 2007, Partitioning diversity into independent alpha and beta components, Ecology, 88, 2427, 10.1890/06-1736.1

Lagkouvardos, 2016, IMNGS: a comprehensive open resource of processed 16S rRNA microbial profiles for ecology and diversity studies, Scientific Reports, 6, 10.1038/srep33721

Lagkouvardos, 2015, Gut metabolites and bacterial community networks during a pilot intervention study with flaxseeds in healthy adult men, Molecular Nutrition & Food Research, 59, 1614, 10.1002/mnfr.201500125

Lozupone, 2007, Quantitative and qualitative β diversity measures lead to different insights into factors that structure microbial communities, Applied and Environmental Microbiology, 73, 1576, 10.1128/AEM.01996-06

Martínez, 2013, Long-term temporal analysis of the human fecal microbiota revealed a stable core of dominant bacterial species, PLoS ONE, 8, e69621, 10.1371/journal.pone.0069621

McMurdie, 2013, phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data, PLoS ONE, 8, e61217, 10.1371/journal.pone.0061217

McMurdie, 2014, Waste not, want not: why rarefying microbiome data is inadmissible, PLoS Computational Biology, 10, e1003531, 10.1371/journal.pcbi.1003531

McMurdie, 2015, Shiny-phyloseq: web application for interactive microbiome analysis with provenance tracking, Bioinformatics, 31, 282, 10.1093/bioinformatics/btu616

Minchin, 1987, An evaluation of the relative robustness of techniques for ecological ordination, Vegetatio, 69, 89, 10.1007/BF00038690

Müller, 2016, Gut barrier impairment by high-fat diet in mice depends on housing conditions, Molecular Nutrition & Food Research, 60, 897, 10.1002/mnfr.201500775

Murtagh, 2014, Ward’s hierarchical agglomerative clustering method: which algorithms implement ward’s criterion?, Journal of Classification, 31, 274, 10.1007/s00357-014-9161-z

Pearson, 1909, Determination of the coefficient of correlation, Science, 30, 23, 10.1126/science.30.757.23

R Core Team, 2013, R: a language and environment for statistical computing

Schaubeck, 2015, Dysbiotic gut microbiota causes transmissible Crohn’s disease-like ileitis independent of failure in antimicrobial defence, Gut, 65, 225, 10.1136/gutjnl-2015-309333

Schloss, 2009, Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities, Applied and Environmental Microbiology, 75, 7537, 10.1128/AEM.01541-09

Shannon, 2001, A mathematical theory of communication, ACM SIGMOBILE Mobile Computing and Communications Review, 5, 3, 10.1145/584091.584093

Simpson, 1949, Measurement of diversity, Nature, 163, 688, 10.1038/163688a0

Sinha, 2015, The microbiome quality control project: baseline study design and future directions, Genome Biology, 16, 10.1186/s13059-014-0572-2

Walker, 2015, 16S rRNA gene-based profiling of the human infant gut microbiota is strongly influenced by sample processing and PCR primer choice, Microbiome, 3, 10.1186/s40168-014-0066-1

Würth, 2015, Physiological relevance of food grade microcapsules: impact of milk protein based microcapsules on inflammation in mouse models for inflammatory bowel diseases, Molecular Nutrition & Food Research, 59, 1629, 10.1002/mnfr.201400885