Pitfalls of genotyping microbial communities with rapidly growing genome collections

Cell Systems - Tập 14 - Trang 160-176.e3 - 2023
Chunyu Zhao1,2, Zhou Jason Shi1,2, Katherine S. Pollard1,2,3
1Chan Zuckerberg Biohub, San Francisco, CA, USA
2Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
3Department of Epidemiology & Biostatistics, University of California San Francisco, San Francisco, CA, USA

Tài liệu tham khảo

Zeng, 2019, Impacts of florfenicol on the microbiota landscape and resistome as revealed by metagenomic analysis, Microbiome, 7, 155, 10.1186/s40168-019-0773-8 Chattopadhyay, 2009, High frequency of hotspot mutations in core genes of Escherichia coli due to short-term positive selection, Proc. Natl. Acad. Sci. USA, 106, 12412, 10.1073/pnas.0906217106 Maini Rekdal, 2019, Discovery and inhibition of an interspecies gut bacterial pathway for levodopa metabolism, Science, 364, eaau6323, 10.1126/science.aau6323 Leshem, 2020, The gut microbiome and individual-specific responses to diet, mSystems, 5, 10.1128/mSystems.00665-20 Power, 2017, Microbial genome-wide association studies: lessons from human GWAS, Nat. Rev. Genet., 18, 41, 10.1038/nrg.2016.132 Smillie, 2018, Strain tracking reveals the determinants of bacterial engraftment in the human gut following fecal microbiota transplantation, Cell Host Microbe, 23, 229, 10.1016/j.chom.2018.01.003 Saak, 2020, Experimental approaches to tracking mobile genetic elements in microbial communities, FEMS Microbiol. Rev., 44, 606, 10.1093/femsre/fuaa025 Mitchell, 2020, Delivery mode affects stability of early infant gut microbiota, Cell Rep. Med., 1, 100156, 10.1016/j.xcrm.2020.100156 Brito, 2019, Transmission of human-associated microbiota along family and social networks, Nat. Microbiol., 4, 964, 10.1038/s41564-019-0409-6 Ianiro, 2022, Variability of strain engraftment and predictability of microbiome composition after fecal microbiota transplantation across different diseases, Nat. Med., 28, 1913, 10.1038/s41591-022-01964-3 Yassour, 2018, Strain-level analysis of mother-to-child bacterial transmission during the first few months of life, Cell Host Microbe, 24, 146, 10.1016/j.chom.2018.06.007 Garud, 2020, Population genetics in the human microbiome, Trends Genet., 36, 53, 10.1016/j.tig.2019.10.010 Shoemaker, 2022, Comparative population genetics in the human gut microbiome, Genome Biol. Evol., 14, evab116, 10.1093/gbe/evab116 Van Rossum, 2020, Diversity within species: interpreting strains in microbiomes, Nat. Rev. Microbiol., 18, 491, 10.1038/s41579-020-0368-1 Forbes, 2018, A fungal world: could the gut mycobiome be involved in neurological disease?, Front. Microbiol., 9, 3249, 10.3389/fmicb.2018.03249 Ghazi, 2022, Strain identification and quantitative analysis in microbial communities, J. Mol. Biol., 434, 167582, 10.1016/j.jmb.2022.167582 Blanco-Miguez, 2022, Extending and improving metagenomic taxonomic profiling with uncharacterized species with MetaPhlAn, Preprint at bioRxiv Breitwieser, 2018, KrakenUniq: confident and fast metagenomics classification using unique k-mer counts, Genome Biol., 19, 198, 10.1186/s13059-018-1568-0 Olm, 2021, inStrain profiles population microdiversity from metagenomic data and sensitively detects shared microbial strains, Nat. Biotechnol., 39, 727, 10.1038/s41587-020-00797-0 Quince, 2021, STRONG: metagenomics strain resolution on assembly graphs, Genome Biol., 22, 214, 10.1186/s13059-021-02419-7 Langmead, 2012, Fast gapped-read alignment with Bowtie 2, Nat. Methods, 9, 357, 10.1038/nmeth.1923 Li, 2009, Fast and accurate short read alignment with Burrows-Wheeler transform, Bioinformatics, 25, 1754, 10.1093/bioinformatics/btp324 Li, 2018, Minimap2: pairwise alignment for nucleotide sequences, Bioinformatics, 34, 3094, 10.1093/bioinformatics/bty191 Anyansi, 2020, Computational methods for strain-level microbial detection in colony and metagenome sequencing data, Front. Microbiol., 11, 1925, 10.3389/fmicb.2020.01925 Bush, 2020, Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism-calling pipelines, GigaScience, 9, giaa007, 10.1093/gigascience/giaa007 Hovhannisyan, 2020, CROSSMAPPER: estimating cross-mapping rates and optimizing experimental design in multi-species sequencing studies, Bioinformatics, 36, 925, 10.1093/bioinformatics/btz626 Zhao, 2022, MIDAS2: metagenomic intra-species diversity analysis system, Bioinformatics Van Rossum, 2021, metaSNV v2: detection of SNVs and subspecies in prokaryotic metagenomes, Bioinformatics, 38, 1162, 10.1093/bioinformatics/btab789 Schloissnig, 2013, Genomic variation landscape of the human gut microbiome, Nature, 493, 45, 10.1038/nature11711 Shi, 2022, Fast and accurate metagenotyping of the human gut microbiome with GT-Pro, Nat. Biotechnol., 40, 507, 10.1038/s41587-021-01102-3 Greenblum, 2015, Extensive strain-level copy-number variation across human gut microbiome species, Cell, 160, 583, 10.1016/j.cell.2014.12.038 Zeevi, 2019, Structural variation in the gut microbiome associates with host health, Nature, 568, 43, 10.1038/s41586-019-1065-y Urban, 2022 Deschamps-Francoeur, 2020, Handling multi-mapped reads in RNA-seq, Comput. Struct. Biotechnol. J., 18, 1569, 10.1016/j.csbj.2020.06.014 Zheng, 2019, Generative modeling of multi-mapping reads with mHi-C advances analysis of Hi-C studies, eLife, 8, e38070, 10.7554/eLife.38070 Garrison, 2018, Variation graph toolkit improves read mapping by representing genetic variation in the reference, Nat. Biotechnol., 36, 875, 10.1038/nbt.4227 Kitts, 2016, Assembly: a resource for assembled genomes at NCBI, Nucleic Acids Res., 44, D73, 10.1093/nar/gkv1226 Leinonen, 2011, The European nucleotide archive, Nucleic Acids Res., 39, D28, 10.1093/nar/gkq967 Chen, 2021, The IMG/M data management and analysis system v.6.0: new tools and advanced capabilities, Nucleic Acids Res., 49, D751, 10.1093/nar/gkaa939 Sood, 2021, Expanding culturomics from gut to extreme environmental settings, mSystems, e0084821, 10.1128/mSystems.00848-21 Sarhan, 2019, Culturomics of the plant prokaryotic microbiome and the dawn of plant-based culture media - a review, J. Adv. Res., 19, 15, 10.1016/j.jare.2019.04.002 Nowrotek, 2019, Culturomics and metagenomics: in understanding of environmental resistome, Front. Environ. Sci. Eng., 13, 40, 10.1007/s11783-019-1121-8 Mukherjee, 2017, 1,003 Reference genomes of bacterial and archaeal isolates expand coverage of the tree of life, Nat. Biotechnol., 35, 676, 10.1038/nbt.3886 Groussin, 2021, Elevated rates of horizontal gene transfer in the industrialized human microbiome, Cell, 184, 2053, 10.1016/j.cell.2021.02.052 Parks, 2017, Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life, Nat. Microbiol., 2, 1533, 10.1038/s41564-017-0012-7 Levin, 2021, Diversity and functional landscapes in the microbiota of animals in the wild, Science, 372, eabb5352, 10.1126/science.abb5352 Nayfach, 2021, A genomic catalog of Earth's microbiomes, Nat. Biotechnol., 39, 499, 10.1038/s41587-020-0718-6 Almeida, 2021, A unified catalog of 204,938 reference genomes from the human gut microbiome, Nat. Biotechnol., 39, 105, 10.1038/s41587-020-0603-3 Hiseni, 2021, HumGut: a comprehensive human gut prokaryotic genomes collection filtered by metagenome data, Microbiome, 9, 165, 10.1186/s40168-021-01114-w Asnicar, 2021, Microbiome connections with host metabolism and habitual diet from 1,098 deeply phenotyped individuals, Nat. Med., 27, 321, 10.1038/s41591-020-01183-8 Smits, 2017, Seasonal cycling in the gut microbiome of the Hadza hunter-gatherers of Tanzania, Science, 357, 802, 10.1126/science.aan4834 Tamburini, 2022, Short- and long-read metagenomics of urban and rural South African gut microbiomes reveal a transitional composition and undescribed taxa, Nat. Commun., 13, 926, 10.1038/s41467-021-27917-x Jain, 2018, High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries, Nat. Commun., 9, 5114, 10.1038/s41467-018-07641-9 Rodriguez-R, 2021, Reply to: "Re-evaluating the evidence for a universal genetic boundary among microbial species", Nat. Commun., 12, 4060, 10.1038/s41467-021-24129-1 Murray, 2021, Re-evaluating the evidence for a universal genetic boundary among microbial species, Nat. Commun., 12, 4059, 10.1038/s41467-021-24128-2 Olm, 2020, Consistent metagenome-derived metrics verify and delineate bacterial species boundaries, mSystems, 5, 10.1128/mSystems.00731-19 Huang, 2012, ART: a next-generation sequencing read simulator, Bioinformatics, 28, 593, 10.1093/bioinformatics/btr708 Langmead, 2019, Scaling read aligners to hundreds of threads on general-purpose processors, Bioinformatics, 35, 421, 10.1093/bioinformatics/bty648 Günther, 2019, The presence and impact of reference bias on population genomic studies of prehistoric human populations, PLoS Genet., 15, e1008302, 10.1371/journal.pgen.1008302 Franzosa, 2018, Species-level functional profiling of metagenomes and metatranscriptomes, Nat. Methods, 15, 962, 10.1038/s41592-018-0176-y Truong, 2015, MetaPhlAn2 for enhanced metagenomic taxonomic profiling, Nat. Methods, 12, 902, 10.1038/nmeth.3589 Kim, 2019, Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype, Nat. Biotechnol., 37, 907, 10.1038/s41587-019-0201-4 Shah, 2021, Sequence deeper without sequencing more: bayesian resolution of ambiguously mapped reads, PLoS Comput. Biol., 17, e1008926, 10.1371/journal.pcbi.1008926 Bray, 2016, Near-optimal probabilistic RNA-seq quantification, Nat. Biotechnol., 34, 525, 10.1038/nbt.3519 Vainberg-Slutskin, 2022, Exodus: sequencing-based pipeline for quantification of pooled variants, Bioinformatics, 38, 3288, 10.1093/bioinformatics/btac319 Zhou, 2018, Accurate reconstruction of microbial strains from metagenomic sequencing using representative reference genomes, 225 Chen, 2021, Reference flow: reducing reference bias using multiple population genomes, Genome Biol., 22, 8, 10.1186/s13059-020-02229-3 Andreu-Sánchez, 2021, A benchmark of genetic variant calling pipelines using metagenomic short-read sequencing, Front. Genet., 12, 648229, 10.3389/fgene.2021.648229 Chen, 2022, Short- and long-read metagenomics expand individualized structural variations in gut microbiomes, Nat. Commun., 13, 3175, 10.1038/s41467-022-30857-9 Yahara, 2021, Long-read metagenomics using PromethION uncovers oral bacteriophages and their interaction with host bacteria, Nat. Commun., 12, 27, 10.1038/s41467-020-20199-9 Xie, 2020, PacBio long reads improve metagenomic assemblies, gene catalogs, and genome binning, Front. Genet., 11, 516269, 10.3389/fgene.2020.516269 Pulido-Tamayo, 2015, Frequency-based haplotype reconstruction from deep sequencing data of bacterial populations, Nucleic Acids Res., 43, e105, 10.1093/nar/gkv478 Li, 2019, BHap: a novel approach for bacterial haplotype reconstruction, Bioinformatics, 35, 4624, 10.1093/bioinformatics/btz280 Cole, 2020, Power in isolation: insights from single cells, Nat. Rev. Microbiol., 18, 364, 10.1038/s41579-020-0381-4 Shajii, 2016, Fast genotyping of known SNPs through approximate k-mer matching, Bioinformatics, 32, i538, 10.1093/bioinformatics/btw460 Phillippy, 2009, Insignia: a DNA signature search web server for diagnostic assay development, Nucleic Acids Res., 37, W229, 10.1093/nar/gkp286 Ounit, 2015, CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers, BMC Genomics, 16, 236, 10.1186/s12864-015-1419-2 Liu, 2019, Fast detection of maximal exact matches via fixed sampling of query K-mers and Bloom filtering of index K-mers, Bioinformatics, 35, 4560, 10.1093/bioinformatics/btz273 Arif, 2019, Discovering millions of plankton genomic markers from the Atlantic Ocean and the Mediterranean Sea, Mol. Ecol. Resour., 19, 526, 10.1111/1755-0998.12985 Laso-Jadart, 2020, metaVaR: introducing metavariant species models for reference-free metagenomic-based population genomics, PLOS One, 15, e0244637, 10.1371/journal.pone.0244637 Leggett, 2014, Reference-free SNP detection: dealing with the data deluge, BMC Genomics, 15, S10, 10.1186/1471-2164-15-S4-S10 Peterlongo, 2017, DiscoSnp++: de novo detection of small variants from raw unassembled read set(s), Preprint at bioRxiv Emerson, 2018, Host-linked soil viral ecology along a permafrost thaw gradient, Nat. Microbiol., 3, 870, 10.1038/s41564-018-0190-y Gregory, 2020, The gut virome database reveals age-dependent patterns of virome diversity in the human gut, Cell Host Microbe, 28, 724, 10.1016/j.chom.2020.08.003 Gregory, 2019, Marine DNA viral macro- and microdiversity from pole to pole, Cell, 177, 1109, 10.1016/j.cell.2019.03.040 Massana, 2022, Metagenome assembled genomes are for eukaryotes too, Cell Genomics, 2, 10.1016/j.xgen.2022.100130 Ondov, 2016, Mash: fast genome and metagenome distance estimation using MinHash, Genome Biology, 17, 10.1186/s13059-016-0997-x Marcais, 2018, MUMmer4: A fast and versatile genome alignment system, PLoS Comput Biol, 14, 10.1371/journal.pcbi.1005944 Foster, 2017, Metacoder: An R package for visualization and manipulation of community taxonomic diversity data, PLoS Comput Biol, 13, 10.1371/journal.pcbi.1005404 Kitts, 2016, Assembly: a resource for assembled genomes at NCBI, Nucleic Acids Res, 44, D73, 10.1093/nar/gkv1226 Parks, 2022, GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy, Nucleic Acids Res, 50, D785, 10.1093/nar/gkab776 Cheng, 2021, Systematic dissection of a complex gut bacterial community, Preprint at bioRxiv Olson, 2015, Best practices for evaluating single nucleotide variant calling methods for microbial genomics, Front. Genet., 6, 235, 10.3389/fgene.2015.00235