Tích hợp phân loại, chức năng và phân tích cấp độ chủng của các cộng đồng vi sinh vật đa dạng với bioBakery 3
Tóm tắt
Các phân tích không phụ thuộc vào văn hóa của các cộng đồng vi sinh vật đã tiến triển một cách mạnh mẽ trong thập kỷ qua, đặc biệt nhờ vào những tiến bộ trong các phương pháp định danh sinh học thông qua metagenomics shotgun. Cơ hội cải thiện tiếp tục gia tăng, với việc tiếp cận tốt hơn tới nhiều loại omics, bộ gen tham chiếu vi sinh vật, và đa dạng mức độ chủng. Để tận dụng những điều này, chúng tôi giới thiệu bioBakery 3, một bộ các phương pháp tích hợp và cải tiến cho việc phân loại thuế, phân tích cấp độ chủng, chức năng và phân loại hệ sinh thái của metagenomes được phát triển mới nhằm xây dựng dựa trên bộ dữ liệu lớn nhất hiện có. So với các lựa chọn hiện tại, MetaPhlAn 3 nâng cao độ chính xác của việc phân loại thuế, và HUMAnN 3 cải thiện khả năng phân tích chức năng và hoạt động. Những phương pháp này đã phát hiện ra các liên kết mới giữa bệnh tật và vi sinh vật trong các ứng dụng cho CRC (1262 metagenomes) và IBD (1635 metagenomes và 817 metatranscriptomes). Phân tích cấp độ chủng của 4077 metagenomes bổ sung với StrainPhlAn 3 và PanPhlAn 3 đã khám phá cấu trúc phân loại và chức năng của vi sinh vật đường ruột phổ biến
Từ khóa
Tài liệu tham khảo
Almeida, 2019, A new genomic blueprint of the human gut Microbiota, Nature, 568, 499, 10.1038/s41586-019-0965-1
Almeida, 2021, A unified catalog of 204,938 reference genomes from the human gut microbiome, Nature Biotechnology, 39, 105, 10.1038/s41587-020-0603-3
Altschul, 1990, Basic local alignment search tool, Journal of Molecular Biology, 215, 403, 10.1016/S0022-2836(05)80360-2
Andrews S O. 2010. FastQC: A Quality Control Tool for High Throughput Sequence Data.
Ashburner, 2000, Gene ontology: tool for the unification of biology the gene ontology consortium, Nature Genetics, 25, 25, 10.1038/75556
Asnicar, 2015, Compact graphical representation of phylogenetic data and metadata with GraPhlAn, PeerJ, 3, 10.7717/peerj.1029
Asnicar, 2017, Studying vertical microbiome transmission from mothers to infants by Strain-Level metagenomic profiling, mSystems, 2, 10.1128/mSystems.00164-16
Asnicar, 2020, Precise phylogenetic analysis of microbial isolates and genomes from metagenomes using PhyloPhlAn 3.0, Nature Communications, 11, 10.1038/s41467-020-16366-7
Beghini, 2017, Large-scale comparative metagenomics of Blastocystis, a common member of the human gut microbiome, The ISME Journal, 11, 2848, 10.1038/ismej.2017.139
Belmann, 2015, Bioboxes: standardised containers for interchangeable bioinformatics software, GigaScience, 4, 10.1186/s13742-015-0087-0
Benson, 1999, Tandem repeats finder: a program to analyze DNA sequences, Nucleic Acids Research, 27, 573, 10.1093/nar/27.2.573
BioBoxes RFC. 2020. BioBoxes. https://github.com/bioboxes/rfc.
Blaser, 2016, Toward a predictive understanding of earth's Microbiomes to Address 21st Century Challenges, mBio, 7, 10.1128/mBio.00714-16
Bolger, 2014, Trimmomatic: a flexible trimmer for illumina sequence data, Bioinformatics, 30, 2114, 10.1093/bioinformatics/btu170
Bolyen, 2019, Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2, Nature Biotechnology, 37, 852, 10.1038/s41587-019-0209-9
Breitwieser, 2019, Human contamination in bacterial genomes has created thousands of spurious proteins, Genome Research, 29, 954, 10.1101/gr.245373.118
Buchfink, 2015, Fast and sensitive protein alignment using DIAMOND, Nature Methods, 12, 59, 10.1038/nmeth.3176
Callahan, 2016, DADA2: high-resolution sample inference from Illumina amplicon data, Nature Methods, 13, 581, 10.1038/nmeth.3869
Chaumeil, 2019, GTDB-Tk: a toolkit to classify genomes with the genome taxonomy database, Bioinformatics, 36, 1925, 10.1093/bioinformatics/btz848
Croucher, 2011, Rapid pneumococcal evolution in response to clinical interventions, Science, 331, 430, 10.1126/science.1198545
El-Gebali, 2019, The pfam protein families database in 2019, Nucleic Acids Research, 47, D427, 10.1093/nar/gky995
Feng, 2015, Gut microbiome development along the colorectal adenoma-carcinoma sequence, Nature Communications, 6, 10.1038/ncomms7528
Ferretti, 2018, Mother-to-Infant microbial transmission from different body sites shapes the developing infant gut microbiome, Cell Host & Microbe, 24, 133, 10.1016/j.chom.2018.06.005
Finn, 2014, Pfam: the protein families database, Nucleic Acids Research, 42, D222, 10.1093/nar/gkt1223
Flint, 2012, Microbial degradation of complex carbohydrates in the gut, Gut Microbes, 3, 289, 10.4161/gmic.19897
Forster, 2019, A human gut bacterial genome and culture collection for improved metagenomic analyses, Nature Biotechnology, 37, 186, 10.1038/s41587-018-0009-7
Franzosa, 2018, Species-level functional profiling of metagenomes and metatranscriptomes, Nature Methods, 15, 962, 10.1038/s41592-018-0176-y
Fritz, 2019, CAMISIM: simulating metagenomes and microbial communities, Microbiome, 7, 10.1186/s40168-019-0633-6
Ghosh, 2020, Adjusting for age improves identification of gut microbiome alterations in multiple diseases, eLife, 9, 10.7554/eLife.50240
Gill, 2006, Metagenomic analysis of the human distal gut microbiome, Science, 312, 1355, 10.1126/science.1124234
Gire, 2014, Genomic surveillance elucidates ebola virus origin and transmission during the 2014 outbreak, Science, 345, 1369, 10.1126/science.1259657
Gopalakrishnan, 2018, Gut microbiome modulates response to anti-PD-1 immunotherapy in melanoma patients, Science, 359, 97, 10.1126/science.aan4236
Gupta, 2019, Association of Flavonifractor plautii, a Flavonoid-Degrading bacterium, with the gut microbiome of colorectal Cancer patients in India, mSystems, 4, 10.1128/mSystems.00438-19
Heinken, 2019, Systematic assessment of secondary bile acid metabolism in gut microbes reveals distinct metabolic capabilities in inflammatory bowel disease, Microbiome, 7, 10.1186/s40168-019-0689-3
Hennig C. 2010. Fpc: Flexible Procedures for Clustering.
Huang, 2012, ART: a next-generation sequencing read simulator, Bioinformatics, 28, 593, 10.1093/bioinformatics/btr708
Huerta-Cepas, 2016, eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences, Nucleic Acids Research, 44, D286, 10.1093/nar/gkv1248
Human Microbiome Project Consortium, 2012, Structure, function and diversity of the healthy human microbiome, Nature, 486, 207, 10.1038/nature11234
Hyatt, 2010, Prodigal: prokaryotic gene recognition and translation initiation site identification, BMC Bioinformatics, 11, 10.1186/1471-2105-11-119
IBDMDB Investigators, 2019, Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases, Nature, 569, 655, 10.1038/s41586-019-1237-9
Kalnins, 2015, Structure and function of CutC choline lyase from human Microbiota bacterium Klebsiella pneumoniae, Journal of Biological Chemistry, 290, 21732, 10.1074/jbc.M115.670471
Kaminski, 2015, High-Specificity targeted functional profiling in microbial communities with ShortBRED, PLOS Computational Biology, 11, 10.1371/journal.pcbi.1004557
Kanehisa, 2014, Data, information, knowledge and principle: back to metabolism in KEGG, Nucleic Acids Research, 42, D199, 10.1093/nar/gkt1076
Kanehisa, 2000, KEGG: kyoto encyclopedia of genes and genomes, Nucleic Acids Research, 28, 27, 10.1093/nar/28.1.27
Karcher, 2020, Analysis of 1321 Eubacterium rectale genomes from metagenomes uncovers complex phylogeographic population structure and subspecies functional adaptations, Genome Biology, 21, 10.1186/s13059-020-02042-y
Karp, 2019, The BioCyc collection of microbial genomes and metabolic pathways, Briefings in Bioinformatics, 20, 1085, 10.1093/bib/bbx085
Katoh, 2013, MAFFT multiple sequence alignment software version 7: improvements in performance and usability, Molecular Biology and Evolution, 30, 772, 10.1093/molbev/mst010
Korpela, 2018, Selective maternal seeding and environment shape the human gut microbiome, Genome Research, 28, 561, 10.1101/gr.233940.117
Kummen, 2017, Elevated trimethylamine-N-oxide (TMAO) is associated with poor prognosis in primary sclerosing cholangitis patients with normal liver function, United European Gastroenterology Journal, 5, 532, 10.1177/2050640616663453
Kuznetsova, 2017, lmerTest Package: Tests in Linear Mixed Effects Models, Journal of Statistical Software, 82, 1, 10.18637/jss.v082.i13
Langmead, 2012, Fast gapped-read alignment with bowtie 2, Nature Methods, 9, 357, 10.1038/nmeth.1923
Le Chatelier, 2013, Richness of human gut microbiome correlates with metabolic markers, Nature, 500, 541, 10.1038/nature12506
Leinonen, 2011, The european nucleotide archive, Nucleic Acids Research, 39, D28, 10.1093/nar/gkq967
Lesker, 2020, An integrated metagenome catalog reveals new insights into the murine gut microbiome, Cell Reports, 30, 2909, 10.1016/j.celrep.2020.02.036
Li, 2015, MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de bruijn graph, Bioinformatics, 31, 1674, 10.1093/bioinformatics/btv033
Li, 2006, Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics, 22, 1658, 10.1093/bioinformatics/btl158
Lloyd-Price, 2017, Strains, functions and dynamics in the expanded human microbiome project, Nature, 550, 61, 10.1038/nature23889
Lozupone, 2005, UniFrac: a new phylogenetic method for comparing microbial communities, Applied and Environmental Microbiology, 71, 8228, 10.1128/AEM.71.12.8228-8235.2005
Lu, 2017, Bracken: estimating species abundance in metagenomics data, PeerJ Computer Science, 3, 10.7717/peerj-cs.104
Lu, 2020, Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding, The Lancet, 395, 565, 10.1016/S0140-6736(20)30251-8
Luo, 2015, ConStrains identifies microbial strains in metagenomic datasets, Nature Biotechnology, 33, 1045, 10.1038/nbt.3319
Ma S. 2019. MMUPHin Bioconductor.
Manara, 2019, Microbial genomes from non-human primate gut metagenomes expand the primate-associated bacterial tree of life with over 1000 novel species, Genome Biology, 20, 10.1186/s13059-019-1923-9
McIntyre, 2017, Comprehensive benchmarking and ensemble approaches for metagenomic classifiers, Genome Biology, 18, 10.1186/s13059-017-1299-7
McIver, 2018, bioBakery: a meta'omic analysis environment, Bioinformatics, 34, 1235, 10.1093/bioinformatics/btx754
MetaHIT Consortium, 2014, Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes, Nature Biotechnology, 32, 822, 10.1038/nbt.2939
Meyer, 2019, Assessing taxonomic metagenome profilers with OPAL, Genome Biology, 20, 10.1186/s13059-019-1646-y
Milanese, 2019, Microbial abundance, activity and population genomic profiling with mOTUs2, Nature Communications, 10, 10.1038/s41467-019-08844-4
Mitra, 2011, Analysis of 16S rRNA environmental sequences using MEGAN, BMC Genomics, 12, 10.1186/1471-2164-12-S3-S17
Morgan, 2013, Biodiversity and functional genomics in the human microbiome, Trends in Genetics, 29, 51, 10.1016/j.tig.2012.09.005
Mukhopadhya, 2018, Sporulation capability and amylosome conservation among diverse human colonic and Rumen isolates of the keystone starch-degrader Ruminococcus bromii, Environmental Microbiology, 20, 324, 10.1111/1462-2920.14000
Nayfach, 2015, Automated and accurate estimation of gene family abundance from shotgun metagenomes, PLOS Computational Biology, 11, 10.1371/journal.pcbi.1004573
Nayfach, 2016, An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography, Genome Research, 26, 1612, 10.1101/gr.201863.115
Nazeen, 2020, Carnelian uncovers hidden functional patterns across diverse study populations from whole metagenome sequencing reads, Genome Biology, 21, 10.1186/s13059-020-1933-7
NCBI Resource Coordinators, 2014, Database resources of the national center for biotechnology information, Nucleic Acids Research, 42, 7, 10.1093/nar/gkt1146
Nurk, 2017, metaSPAdes: a new versatile metagenomic assembler, Genome Research, 27, 824, 10.1101/gr.213959.116
Oellgaard, 2017, Trimethylamine N-oxide (TMAO) as a new potential therapeutic target for insulin resistance and Cancer, Current Pharmaceutical Design, 23, 3699, 10.2174/1381612823666170622095324
Oksanen, 2008, The vegan package, Community Ecology Package, 10
Olm, 2019, Genome-resolved metagenomics of eukaryotic populations during early colonization of premature infants and in hospital rooms, Microbiome, 7, 10.1186/s40168-019-0638-1
Ondov, 2016, Mash: fast genome and metagenome distance estimation using MinHash, Genome Biology, 17, 1, 10.1186/s13059-016-0997-x
Parks, 2017, Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life, Nature Microbiology, 2, 1533, 10.1038/s41564-017-0012-7
Pasolli, 2016, Machine learning Meta-analysis of large metagenomic datasets: tools and biological insights, PLOS Computational Biology, 12, 10.1371/journal.pcbi.1004977
Pasolli, 2017, Accessible, curated metagenomic data through ExperimentHub, Nature Methods, 14, 1023, 10.1038/nmeth.4468
Pasolli, 2019, Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle, Cell, 176, 649, 10.1016/j.cell.2019.01.001
Patwa, 2011, Chronic intestinal inflammation induces stress-response genes in commensal Escherichia coli, Gastroenterology, 141, 1842, 10.1053/j.gastro.2011.06.064
Powell, 2014, eggNOG v4.0: nested orthology inference across 3686 organisms, Nucleic Acids Research, 42, D231, 10.1093/nar/gkt1253
Poyet, 2019, A library of human gut bacterial isolates paired with longitudinal multiomics data enables mechanistic microbiome research, Nature Medicine, 25, 1442, 10.1038/s41591-019-0559-3
Quince, 2017, Shotgun metagenomics, from sampling to analysis, Nature Biotechnology, 35, 833, 10.1038/nbt.3935
Rath, 2017, Uncovering the trimethylamine-producing Bacteria of the human gut Microbiota, Microbiome, 5, 10.1186/s40168-017-0271-9
Rath, 2019, Potential TMA-Producing Bacteria are ubiquitously found in mammalia, Frontiers in Microbiology, 10, 10.3389/fmicb.2019.02966
Rho, 2010, FragGeneScan: predicting genes in short and error-prone reads, Nucleic Acids Research, 38, 10.1093/nar/gkq747
Rice, 2000, EMBOSS: the european molecular biology open software suite, Trends in Genetics, 16, 276, 10.1016/S0168-9525(00)02024-2
Schaubeck, 2016, Dysbiotic gut Microbiota causes transmissible crohn's disease-like ileitis independent of failure in antimicrobial defence, Gut, 65, 225, 10.1136/gutjnl-2015-309333
Scholz, 2016, Strain-level microbial epidemiology and population genomics from shotgun metagenomics, Nature Methods, 13, 435, 10.1038/nmeth.3802
Sczyrba, 2017, Critical assessment of metagenome Interpretation-a benchmark of metagenomics software, Nature Methods, 14, 1063, 10.1038/nmeth.4458
Segata, 2012, Metagenomic microbial community profiling using unique clade-specific marker genes, Nature Methods, 9, 811, 10.1038/nmeth.2066
Segata, 2013, Computational meta'omics for microbial community studies, Molecular Systems Biology, 9, 10.1038/msb.2013.22
Segata, 2011, Toward an efficient method of identifying core genes for evolutionary and functional microbial phylogenies, PLOS ONE, 6, 10.1371/journal.pone.0024704
Shao, 2019, Stunted microbiota and opportunistic pathogen colonization in caesarean-section birth, Nature, 574, 117, 10.1038/s41586-019-1560-1
Sivan, 2015, Commensal Bifidobacterium promotes antitumor immunity and facilitates anti-PD-L1 efficacy, Science, 350, 1084, 10.1126/science.aac4255
Stamatakis, 2014, RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies, Bioinformatics, 30, 1312, 10.1093/bioinformatics/btu033
Steinegger, 2018, Clustering huge protein sequence sets in linear time, Nature Communications, 9, 10.1038/s41467-018-04964-5
Stewart, 2019, Compendium of 4,941 rumen metagenome-assembled genomes for rumen microbiome biology and enzyme discovery, Nature Biotechnology, 37, 953, 10.1038/s41587-019-0202-3
Sun, 2018, Gut microbiota and intestinal FXR mediate the clinical benefits of metformin, Nature Medicine, 24, 1919, 10.1038/s41591-018-0222-4
Suzek, 2007, UniRef: comprehensive and non-redundant UniProt reference clusters, Bioinformatics, 23, 1282, 10.1093/bioinformatics/btm098
Suzek, 2015, UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches, Bioinformatics, 31, 926, 10.1093/bioinformatics/btu739
Tang, 2013, Intestinal microbial metabolism of phosphatidylcholine and cardiovascular risk, New England Journal of Medicine, 368, 1575, 10.1056/NEJMoa1109400
Tanoue, 2019, A defined commensal consortium elicits CD8 T cells and anti-cancer immunity, Nature, 565, 600, 10.1038/s41586-019-0878-z
Tett, 2019, The Prevotella copri complex comprises four distinct clades underrepresented in westernized populations, Cell Host & Microbe, 26, 666, 10.1016/j.chom.2019.08.018
The Gene Ontology Consortium, 2019, The gene ontology resource: 20 years and still GOing strong, Nucleic Acids Research, 47, D330, 10.1093/nar/gky1055
The UniProt Consortium, 2019, UniProt: a worldwide hub of protein knowledge, Nucleic Acids Research, 47, D506, 10.1093/nar/gky1049
Thomas, 2019, Metagenomic analysis of colorectal cancer datasets identifies cross-cohort microbial diagnostic signatures and a link with choline degradation, Nature Medicine, 25, 667, 10.1038/s41591-019-0405-7
Thomas, 2019, Multiple levels of the unknown in microbiome research, BMC Biology, 17, 10.1186/s12915-019-0667-z
Truong, 2015, MetaPhlAn2 for enhanced metagenomic taxonomic profiling, Nature Methods, 12, 902, 10.1038/nmeth.3589
Truong, 2017, Microbial strain-level population structure and genetic diversity from metagenomes, Genome Research, 27, 626, 10.1101/gr.216242.116
Tyson, 2004, Community structure and metabolism through reconstruction of microbial genomes from the environment, Nature, 428, 37, 10.1038/nature02340
Unified Microbiome Initiative Consortium, 2015, MICROBIOME A unified initiative to harness earth's microbiomes, Science, 350, 507, 10.1126/science.aac8480
Venter, 2004, Environmental genome shotgun sequencing of the sargasso sea, Science, 304, 66, 10.1126/science.1093857
Viechtbauer, 2010, Conducting Meta-Analyses in R with the metafor Package, Journal of Statistical Software, 36, 1, 10.18637/jss.v036.i03
Vogtmann, 2016, Colorectal Cancer and the human gut microbiome: reproducibility with Whole-Genome shotgun sequencing, PLOS ONE, 11, 10.1371/journal.pone.0155362
Weill, 2017, Genomic history of the seventh pandemic of cholera in africa, Science, 358, 785, 10.1126/science.aad5901
What are proteomes. 2020. UniProt. https://www.uniprot.org/help/proteome.
Wirbel, 2019, Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer, Nature Medicine, 25, 679, 10.1038/s41591-019-0406-6
Wood, 2019, Improved metagenomic analysis with Kraken 2, Genome Biology, 20, 10.1186/s13059-019-1891-0
Xiong, 2015, Development of an enhanced metaproteomic approach for deepening the microbiome characterization of the human infant gut, Journal of Proteome Research, 14, 133, 10.1021/pr500936p
Yachida, 2019, Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer, Nature Medicine, 25, 968, 10.1038/s41591-019-0458-7
Yassour, 2018, Strain-Level analysis of Mother-to-Child bacterial transmission during the first few months of life, Cell Host & Microbe, 24, 146, 10.1016/j.chom.2018.06.007
Ye, 2019, Benchmarking metagenomics tools for taxonomic classification, Cell, 178, 779, 10.1016/j.cell.2019.07.010
Yilmaz, 2014, The SILVA and "All-species Living Tree Project (LTP)" taxonomic frameworks, Nucleic Acids Research, 42, D643, 10.1093/nar/gkt1209
Yu, 2017, Metagenomic analysis of faecal microbiome as a tool towards targeted non-invasive biomarkers for colorectal Cancer, Gut, 66, 70, 10.1136/gutjnl-2015-309800
Yutin, 2018, Discovery of an expansive bacteriophage family that includes the most abundant viruses from the human gut, Nature Microbiology, 3, 38, 10.1038/s41564-017-0053-y
Ze, 2012, Ruminococcus bromii is a keystone species for the degradation of resistant starch in the human colon, The ISME Journal, 6, 1535, 10.1038/ismej.2012.4
Zeller, 2014, Potential of fecal Microbiota for early-stage detection of colorectal Cancer, Molecular Systems Biology, 10, 10.15252/msb.20145645
Zhu, 2019, Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea, Nature Communications, 10, 10.1038/s41467-019-13443-4
Zolfo, 2019, Detecting contamination in viromes using ViromeQC, Nature Biotechnology, 37, 1408, 10.1038/s41587-019-0334-5