Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3

eLife - Tập 10
Francesco Beghini1, Lauren J. McIver2, Aitor Blanco‐Míguez1, Léonard Dubois1, Francesco Asnicar1, Sagun Maharjan2,3, Ana Mailyan2,3, Paolo Manghi1, Matthias Scholz4, Andrew Maltez Thomas1, Mireia Vallés-Colomer1, George Weingart2,3, Yancong Zhang2,3, Moreno Zolfo1, Curtis Huttenhower2,3, Eric A. Franzosa2,3, Nicola Segata1,5,6
1Department CIBIO, University of Trento, Trento, Italy
2Harvard T.H. Chan School of Public Health, Boston, United States;
3The Broad Institute of MIT and Harvard, Cambridge, United States;
4Department of Food Quality and Nutrition, Research and Innovation Center, Edmund Mach Foundation, San Michele all'Adige, Italy;
5European Research Council
6IEO, European Institute of Oncology IRCCS, Milan, Italy

Tóm tắt

Culture-independent analyses of microbial communities have progressed dramatically in the last decade, particularly due to advances in methods for biological profiling via shotgun metagenomics. Opportunities for improvement continue to accelerate, with greater access to multi-omics, microbial reference genomes, and strain-level diversity. To leverage these, we present bioBakery 3, a set of integrated, improved methods for taxonomic, strain-level, functional, and phylogenetic profiling of metagenomes newly developed to build on the largest set of reference sequences now available. Compared to current alternatives, MetaPhlAn 3 increases the accuracy of taxonomic profiling, and HUMAnN 3 improves that of functional potential and activity. These methods detected novel disease-microbiome links in applications to CRC (1262 metagenomes) and IBD (1635 metagenomes and 817 metatranscriptomes). Strain-level profiling of an additional 4077 metagenomes with StrainPhlAn 3 and PanPhlAn 3 unraveled the phylogenetic and functional structure of the common gut microbe Ruminococcus bromii, previously described by only 15 isolate genomes. With open-source implementations and cloud-deployable reproducible workflows, the bioBakery 3 platform can help researchers deepen the resolution, scale, and accuracy of multi-omic profiling for microbial community studies.

Từ khóa


Tài liệu tham khảo

Almeida, 2019, A new genomic blueprint of the human gut Microbiota, Nature, 568, 499, 10.1038/s41586-019-0965-1

Almeida, 2021, A unified catalog of 204,938 reference genomes from the human gut microbiome, Nature Biotechnology, 39, 105, 10.1038/s41587-020-0603-3

Altschul, 1990, Basic local alignment search tool, Journal of Molecular Biology, 215, 403, 10.1016/S0022-2836(05)80360-2

Andrews S O. 2010. FastQC: A Quality Control Tool for High Throughput Sequence Data.

Ashburner, 2000, Gene ontology: tool for the unification of biology the gene ontology consortium, Nature Genetics, 25, 25, 10.1038/75556

Asnicar, 2015, Compact graphical representation of phylogenetic data and metadata with GraPhlAn, PeerJ, 3, 10.7717/peerj.1029

Asnicar, 2017, Studying vertical microbiome transmission from mothers to infants by Strain-Level metagenomic profiling, mSystems, 2, 10.1128/mSystems.00164-16

Asnicar, 2020, Precise phylogenetic analysis of microbial isolates and genomes from metagenomes using PhyloPhlAn 3.0, Nature Communications, 11, 10.1038/s41467-020-16366-7

Bairoch, 2000, The ENZYME database in 2000, Nucleic Acids Research, 28, 304, 10.1093/nar/28.1.304

Beghini, 2017, Large-scale comparative metagenomics of Blastocystis, a common member of the human gut microbiome, The ISME Journal, 11, 2848, 10.1038/ismej.2017.139

Belmann, 2015, Bioboxes: standardised containers for interchangeable bioinformatics software, GigaScience, 4, 10.1186/s13742-015-0087-0

Benson, 1999, Tandem repeats finder: a program to analyze DNA sequences, Nucleic Acids Research, 27, 573, 10.1093/nar/27.2.573

BioBoxes RFC. 2020. BioBoxes. https://github.com/bioboxes/rfc.

Blaser, 2016, Toward a predictive understanding of earth's Microbiomes to Address 21st Century Challenges, mBio, 7, 10.1128/mBio.00714-16

Bolger, 2014, Trimmomatic: a flexible trimmer for illumina sequence data, Bioinformatics, 30, 2114, 10.1093/bioinformatics/btu170

Bolyen, 2019, Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2, Nature Biotechnology, 37, 852, 10.1038/s41587-019-0209-9

Breiman, 2001, Random forests, Machine Learning, 45, 5, 10.1023/A:1010933404324

Breitwieser, 2019, Human contamination in bacterial genomes has created thousands of spurious proteins, Genome Research, 29, 954, 10.1101/gr.245373.118

Buchfink, 2015, Fast and sensitive protein alignment using DIAMOND, Nature Methods, 12, 59, 10.1038/nmeth.3176

Callahan, 2016, DADA2: high-resolution sample inference from Illumina amplicon data, Nature Methods, 13, 581, 10.1038/nmeth.3869

Chaumeil, 2019, GTDB-Tk: a toolkit to classify genomes with the genome taxonomy database, Bioinformatics, 36, 1925, 10.1093/bioinformatics/btz848

Clark, 2016, GenBank, Nucleic Acids Research, 44, D67, 10.1093/nar/gkv1276

Croucher, 2011, Rapid pneumococcal evolution in response to clinical interventions, Science, 331, 430, 10.1126/science.1198545

El-Gebali, 2019, The pfam protein families database in 2019, Nucleic Acids Research, 47, D427, 10.1093/nar/gky995

Feng, 2015, Gut microbiome development along the colorectal adenoma-carcinoma sequence, Nature Communications, 6, 10.1038/ncomms7528

Ferretti, 2018, Mother-to-Infant microbial transmission from different body sites shapes the developing infant gut microbiome, Cell Host & Microbe, 24, 133, 10.1016/j.chom.2018.06.005

Finn, 2014, Pfam: the protein families database, Nucleic Acids Research, 42, D222, 10.1093/nar/gkt1223

Flint, 2012, Microbial degradation of complex carbohydrates in the gut, Gut Microbes, 3, 289, 10.4161/gmic.19897

Forster, 2019, A human gut bacterial genome and culture collection for improved metagenomic analyses, Nature Biotechnology, 37, 186, 10.1038/s41587-018-0009-7

Franzosa, 2018, Species-level functional profiling of metagenomes and metatranscriptomes, Nature Methods, 15, 962, 10.1038/s41592-018-0176-y

Fritz, 2019, CAMISIM: simulating metagenomes and microbial communities, Microbiome, 7, 10.1186/s40168-019-0633-6

Ghosh, 2020, Adjusting for age improves identification of gut microbiome alterations in multiple diseases, eLife, 9, 10.7554/eLife.50240

Gill, 2006, Metagenomic analysis of the human distal gut microbiome, Science, 312, 1355, 10.1126/science.1124234

Gire, 2014, Genomic surveillance elucidates ebola virus origin and transmission during the 2014 outbreak, Science, 345, 1369, 10.1126/science.1259657

Gopalakrishnan, 2018, Gut microbiome modulates response to anti-PD-1 immunotherapy in melanoma patients, Science, 359, 97, 10.1126/science.aan4236

Gupta, 2019, Association of Flavonifractor plautii, a Flavonoid-Degrading bacterium, with the gut microbiome of colorectal Cancer patients in India, mSystems, 4, 10.1128/mSystems.00438-19

Heinken, 2019, Systematic assessment of secondary bile acid metabolism in gut microbes reveals distinct metabolic capabilities in inflammatory bowel disease, Microbiome, 7, 10.1186/s40168-019-0689-3

Hennig C. 2010. Fpc: Flexible Procedures for Clustering.

Huang, 2012, ART: a next-generation sequencing read simulator, Bioinformatics, 28, 593, 10.1093/bioinformatics/btr708

Huerta-Cepas, 2016, eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences, Nucleic Acids Research, 44, D286, 10.1093/nar/gkv1248

Human Microbiome Project Consortium, 2012, Structure, function and diversity of the healthy human microbiome, Nature, 486, 207, 10.1038/nature11234

Hyatt, 2010, Prodigal: prokaryotic gene recognition and translation initiation site identification, BMC Bioinformatics, 11, 10.1186/1471-2105-11-119

IBDMDB Investigators, 2019, Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases, Nature, 569, 655, 10.1038/s41586-019-1237-9

Kalnins, 2015, Structure and function of CutC choline lyase from human Microbiota bacterium Klebsiella pneumoniae, Journal of Biological Chemistry, 290, 21732, 10.1074/jbc.M115.670471

Kaminski, 2015, High-Specificity targeted functional profiling in microbial communities with ShortBRED, PLOS Computational Biology, 11, 10.1371/journal.pcbi.1004557

Kanehisa, 2014, Data, information, knowledge and principle: back to metabolism in KEGG, Nucleic Acids Research, 42, D199, 10.1093/nar/gkt1076

Kanehisa, 2000, KEGG: kyoto encyclopedia of genes and genomes, Nucleic Acids Research, 28, 27, 10.1093/nar/28.1.27

Karcher, 2020, Analysis of 1321 Eubacterium rectale genomes from metagenomes uncovers complex phylogeographic population structure and subspecies functional adaptations, Genome Biology, 21, 10.1186/s13059-020-02042-y

Karp, 2019, The BioCyc collection of microbial genomes and metabolic pathways, Briefings in Bioinformatics, 20, 1085, 10.1093/bib/bbx085

Katoh, 2013, MAFFT multiple sequence alignment software version 7: improvements in performance and usability, Molecular Biology and Evolution, 30, 772, 10.1093/molbev/mst010

Kaufman, 2009, Finding Groups in Data: An Introduction to Cluster Analysis, 10.1002/9780470316801

Korpela, 2018, Selective maternal seeding and environment shape the human gut microbiome, Genome Research, 28, 561, 10.1101/gr.233940.117

Kummen, 2017, Elevated trimethylamine-N-oxide (TMAO) is associated with poor prognosis in primary sclerosing cholangitis patients with normal liver function, United European Gastroenterology Journal, 5, 532, 10.1177/2050640616663453

Kuznetsova, 2017, lmerTest Package: Tests in Linear Mixed Effects Models, Journal of Statistical Software, 82, 1, 10.18637/jss.v082.i13

Langmead, 2012, Fast gapped-read alignment with bowtie 2, Nature Methods, 9, 357, 10.1038/nmeth.1923

Le Chatelier, 2013, Richness of human gut microbiome correlates with metabolic markers, Nature, 500, 541, 10.1038/nature12506

Leinonen, 2011, The european nucleotide archive, Nucleic Acids Research, 39, D28, 10.1093/nar/gkq967

Lesker, 2020, An integrated metagenome catalog reveals new insights into the murine gut microbiome, Cell Reports, 30, 2909, 10.1016/j.celrep.2020.02.036

Li, 2015, MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de bruijn graph, Bioinformatics, 31, 1674, 10.1093/bioinformatics/btv033

Li, 2006, Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics, 22, 1658, 10.1093/bioinformatics/btl158

Lloyd-Price, 2017, Strains, functions and dynamics in the expanded human microbiome project, Nature, 550, 61, 10.1038/nature23889

Lozupone, 2005, UniFrac: a new phylogenetic method for comparing microbial communities, Applied and Environmental Microbiology, 71, 8228, 10.1128/AEM.71.12.8228-8235.2005

Lu, 2017, Bracken: estimating species abundance in metagenomics data, PeerJ Computer Science, 3, 10.7717/peerj-cs.104

Lu, 2020, Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding, The Lancet, 395, 565, 10.1016/S0140-6736(20)30251-8

Luo, 2015, ConStrains identifies microbial strains in metagenomic datasets, Nature Biotechnology, 33, 1045, 10.1038/nbt.3319

Ma S. 2019. MMUPHin Bioconductor.

Manara, 2019, Microbial genomes from non-human primate gut metagenomes expand the primate-associated bacterial tree of life with over 1000 novel species, Genome Biology, 20, 10.1186/s13059-019-1923-9

McIntyre, 2017, Comprehensive benchmarking and ensemble approaches for metagenomic classifiers, Genome Biology, 18, 10.1186/s13059-017-1299-7

McIver, 2018, bioBakery: a meta'omic analysis environment, Bioinformatics, 34, 1235, 10.1093/bioinformatics/btx754

MetaHIT Consortium, 2014, Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes, Nature Biotechnology, 32, 822, 10.1038/nbt.2939

Meyer, 2019, Assessing taxonomic metagenome profilers with OPAL, Genome Biology, 20, 10.1186/s13059-019-1646-y

Milanese, 2019, Microbial abundance, activity and population genomic profiling with mOTUs2, Nature Communications, 10, 10.1038/s41467-019-08844-4

Mitra, 2011, Analysis of 16S rRNA environmental sequences using MEGAN, BMC Genomics, 12, 10.1186/1471-2164-12-S3-S17

Morgan, 2013, Biodiversity and functional genomics in the human microbiome, Trends in Genetics, 29, 51, 10.1016/j.tig.2012.09.005

Mukhopadhya, 2018, Sporulation capability and amylosome conservation among diverse human colonic and Rumen isolates of the keystone starch-degrader Ruminococcus bromii, Environmental Microbiology, 20, 324, 10.1111/1462-2920.14000

Nayfach, 2015, Automated and accurate estimation of gene family abundance from shotgun metagenomes, PLOS Computational Biology, 11, 10.1371/journal.pcbi.1004573

Nayfach, 2016, An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography, Genome Research, 26, 1612, 10.1101/gr.201863.115

Nazeen, 2020, Carnelian uncovers hidden functional patterns across diverse study populations from whole metagenome sequencing reads, Genome Biology, 21, 10.1186/s13059-020-1933-7

NCBI Resource Coordinators, 2014, Database resources of the national center for biotechnology information, Nucleic Acids Research, 42, 7, 10.1093/nar/gkt1146

Nurk, 2017, metaSPAdes: a new versatile metagenomic assembler, Genome Research, 27, 824, 10.1101/gr.213959.116

Oellgaard, 2017, Trimethylamine N-oxide (TMAO) as a new potential therapeutic target for insulin resistance and Cancer, Current Pharmaceutical Design, 23, 3699, 10.2174/1381612823666170622095324

Oksanen, 2008, The vegan package, Community Ecology Package, 10

Olm, 2019, Genome-resolved metagenomics of eukaryotic populations during early colonization of premature infants and in hospital rooms, Microbiome, 7, 10.1186/s40168-019-0638-1

Ondov, 2016, Mash: fast genome and metagenome distance estimation using MinHash, Genome Biology, 17, 1, 10.1186/s13059-016-0997-x

Parks, 2017, Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life, Nature Microbiology, 2, 1533, 10.1038/s41564-017-0012-7

Pasolli, 2016, Machine learning Meta-analysis of large metagenomic datasets: tools and biological insights, PLOS Computational Biology, 12, 10.1371/journal.pcbi.1004977

Pasolli, 2017, Accessible, curated metagenomic data through ExperimentHub, Nature Methods, 14, 1023, 10.1038/nmeth.4468

Pasolli, 2019, Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle, Cell, 176, 649, 10.1016/j.cell.2019.01.001

Patwa, 2011, Chronic intestinal inflammation induces stress-response genes in commensal Escherichia coli, Gastroenterology, 141, 1842, 10.1053/j.gastro.2011.06.064

Powell, 2014, eggNOG v4.0: nested orthology inference across 3686 organisms, Nucleic Acids Research, 42, D231, 10.1093/nar/gkt1253

Poyet, 2019, A library of human gut bacterial isolates paired with longitudinal multiomics data enables mechanistic microbiome research, Nature Medicine, 25, 1442, 10.1038/s41591-019-0559-3

Quince, 2017, Shotgun metagenomics, from sampling to analysis, Nature Biotechnology, 35, 833, 10.1038/nbt.3935

Rath, 2017, Uncovering the trimethylamine-producing Bacteria of the human gut Microbiota, Microbiome, 5, 10.1186/s40168-017-0271-9

Rath, 2019, Potential TMA-Producing Bacteria are ubiquitously found in mammalia, Frontiers in Microbiology, 10, 10.3389/fmicb.2019.02966

Rho, 2010, FragGeneScan: predicting genes in short and error-prone reads, Nucleic Acids Research, 38, 10.1093/nar/gkq747

Rice, 2000, EMBOSS: the european molecular biology open software suite, Trends in Genetics, 16, 276, 10.1016/S0168-9525(00)02024-2

Schaubeck, 2016, Dysbiotic gut Microbiota causes transmissible crohn's disease-like ileitis independent of failure in antimicrobial defence, Gut, 65, 225, 10.1136/gutjnl-2015-309333

Scholz, 2016, Strain-level microbial epidemiology and population genomics from shotgun metagenomics, Nature Methods, 13, 435, 10.1038/nmeth.3802

Sczyrba, 2017, Critical assessment of metagenome Interpretation-a benchmark of metagenomics software, Nature Methods, 14, 1063, 10.1038/nmeth.4458

Segata, 2012, Metagenomic microbial community profiling using unique clade-specific marker genes, Nature Methods, 9, 811, 10.1038/nmeth.2066

Segata, 2013, Computational meta'omics for microbial community studies, Molecular Systems Biology, 9, 10.1038/msb.2013.22

Segata, 2011, Toward an efficient method of identifying core genes for evolutionary and functional microbial phylogenies, PLOS ONE, 6, 10.1371/journal.pone.0024704

Shao, 2019, Stunted microbiota and opportunistic pathogen colonization in caesarean-section birth, Nature, 574, 117, 10.1038/s41586-019-1560-1

Sivan, 2015, Commensal Bifidobacterium promotes antitumor immunity and facilitates anti-PD-L1 efficacy, Science, 350, 1084, 10.1126/science.aac4255

Stamatakis, 2014, RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies, Bioinformatics, 30, 1312, 10.1093/bioinformatics/btu033

Steinegger, 2018, Clustering huge protein sequence sets in linear time, Nature Communications, 9, 10.1038/s41467-018-04964-5

Stewart, 2019, Compendium of 4,941 rumen metagenome-assembled genomes for rumen microbiome biology and enzyme discovery, Nature Biotechnology, 37, 953, 10.1038/s41587-019-0202-3

Sun, 2018, Gut microbiota and intestinal FXR mediate the clinical benefits of metformin, Nature Medicine, 24, 1919, 10.1038/s41591-018-0222-4

Suzek, 2007, UniRef: comprehensive and non-redundant UniProt reference clusters, Bioinformatics, 23, 1282, 10.1093/bioinformatics/btm098

Suzek, 2015, UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches, Bioinformatics, 31, 926, 10.1093/bioinformatics/btu739

Tang, 2013, Intestinal microbial metabolism of phosphatidylcholine and cardiovascular risk, New England Journal of Medicine, 368, 1575, 10.1056/NEJMoa1109400

Tanoue, 2019, A defined commensal consortium elicits CD8 T cells and anti-cancer immunity, Nature, 565, 600, 10.1038/s41586-019-0878-z

Tett, 2019, The Prevotella copri complex comprises four distinct clades underrepresented in westernized populations, Cell Host & Microbe, 26, 666, 10.1016/j.chom.2019.08.018

The Gene Ontology Consortium, 2019, The gene ontology resource: 20 years and still GOing strong, Nucleic Acids Research, 47, D330, 10.1093/nar/gky1055

The UniProt Consortium, 2019, UniProt: a worldwide hub of protein knowledge, Nucleic Acids Research, 47, D506, 10.1093/nar/gky1049

Thomas, 2019, Metagenomic analysis of colorectal cancer datasets identifies cross-cohort microbial diagnostic signatures and a link with choline degradation, Nature Medicine, 25, 667, 10.1038/s41591-019-0405-7

Thomas, 2019, Multiple levels of the unknown in microbiome research, BMC Biology, 17, 10.1186/s12915-019-0667-z

Truong, 2015, MetaPhlAn2 for enhanced metagenomic taxonomic profiling, Nature Methods, 12, 902, 10.1038/nmeth.3589

Truong, 2017, Microbial strain-level population structure and genetic diversity from metagenomes, Genome Research, 27, 626, 10.1101/gr.216242.116

Turnbaugh, 2007, The human microbiome project, Nature, 449, 804, 10.1038/nature06244

Tyson, 2004, Community structure and metabolism through reconstruction of microbial genomes from the environment, Nature, 428, 37, 10.1038/nature02340

Unified Microbiome Initiative Consortium, 2015, MICROBIOME A unified initiative to harness earth's microbiomes, Science, 350, 507, 10.1126/science.aac8480

Venter, 2004, Environmental genome shotgun sequencing of the sargasso sea, Science, 304, 66, 10.1126/science.1093857

Viechtbauer, 2010, Conducting Meta-Analyses in R with the metafor Package, Journal of Statistical Software, 36, 1, 10.18637/jss.v036.i03

Vogtmann, 2016, Colorectal Cancer and the human gut microbiome: reproducibility with Whole-Genome shotgun sequencing, PLOS ONE, 11, 10.1371/journal.pone.0155362

Weill, 2017, Genomic history of the seventh pandemic of cholera in africa, Science, 358, 785, 10.1126/science.aad5901

What are proteomes. 2020. UniProt. https://www.uniprot.org/help/proteome.

Wirbel, 2019, Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer, Nature Medicine, 25, 679, 10.1038/s41591-019-0406-6

Wood, 2019, Improved metagenomic analysis with Kraken 2, Genome Biology, 20, 10.1186/s13059-019-1891-0

Xiong, 2015, Development of an enhanced metaproteomic approach for deepening the microbiome characterization of the human infant gut, Journal of Proteome Research, 14, 133, 10.1021/pr500936p

Yachida, 2019, Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer, Nature Medicine, 25, 968, 10.1038/s41591-019-0458-7

Yassour, 2018, Strain-Level analysis of Mother-to-Child bacterial transmission during the first few months of life, Cell Host & Microbe, 24, 146, 10.1016/j.chom.2018.06.007

Ye, 2019, Benchmarking metagenomics tools for taxonomic classification, Cell, 178, 779, 10.1016/j.cell.2019.07.010

Yilmaz, 2014, The SILVA and "All-species Living Tree Project (LTP)" taxonomic frameworks, Nucleic Acids Research, 42, D643, 10.1093/nar/gkt1209

Yu, 2017, Metagenomic analysis of faecal microbiome as a tool towards targeted non-invasive biomarkers for colorectal Cancer, Gut, 66, 70, 10.1136/gutjnl-2015-309800

Yutin, 2018, Discovery of an expansive bacteriophage family that includes the most abundant viruses from the human gut, Nature Microbiology, 3, 38, 10.1038/s41564-017-0053-y

Ze, 2012, Ruminococcus bromii is a keystone species for the degradation of resistant starch in the human colon, The ISME Journal, 6, 1535, 10.1038/ismej.2012.4

Zeller, 2014, Potential of fecal Microbiota for early-stage detection of colorectal Cancer, Molecular Systems Biology, 10, 10.15252/msb.20145645

Zhu, 2019, Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea, Nature Communications, 10, 10.1038/s41467-019-13443-4

Zolfo, 2019, Detecting contamination in viromes using ViromeQC, Nature Biotechnology, 37, 1408, 10.1038/s41587-019-0334-5

Zou, 2019, 1,520 reference genomes from cultivated human gut Bacteria enable functional microbiome analyses, Nature Biotechnology, 37, 179, 10.1038/s41587-018-0008-8