Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes

Nature Biotechnology - Tập 32 Số 8 - Trang 822-828 - 2014
Henrik Bjørn Nielsen1,2, Mathieu Almeida3,4, Agnieszka Sierakowska Juncker1,2, Simon Rasmussen1, Junhua Li5,6, Shinichi Sunagawa7, Damian R. Plichta1, Laurent Gautier1, Anders Gorm Pedersen1, Emmanuelle Maguin4,8, Éric Pelletier9,10, Ida Bonde1,2, Trine Nielsen11, Chaysavanh Manichanh12, Manimozhiyan Arumugam13,7,11, Jean‐Daniel Zucker4,8, Marcelo Bertalan Quintanilha dos Santos1, Nikolaj Blom2, Natalia Borruel12, Kristoffer Sølvsten Burgdorf11, Fouad Boumezbeur4,8, Francesc Casellas12, Joël Doré4,8, Piotr Dworzyński1, Francisco Guarner12, Torben Hansen14,11, Falk Hildebrand15,16, Rolf Sommer Kaas17, Sean P. Kennedy8, Karsten Kristiansen13,18, Jens Roat Kultima7, Pierre Léonard4,8, Florence Levenez4,8, Ole Lund1, Bouziane Moumen4,8, Denis Le Paslier9,10, Joe-Élie Salem4,8, Oluf Pedersen19,20,11, Edi Prifti4,8, Junjie Qin21,13, Jeroen Raes15,22,23, Søren J. Sørensen24, Julien Tap7, Sebastian Tims25, David W. Ussery1, Takuji Yamada26,7, Pierre Renault4, Thomas Sicheritz‐Pontén1,2, Peer Bork7,27, Jun Wang13,18,11,28, Søren Brunak1,2, S. Dusko Ehrlich29,4,8, Alexandre Jamet4, Antonietta Cultrone4, Christine Delorme4, Éric Guédon4, Gaetana Vandemeulebrouck4, Ghalia Kaci4, Hervé M. Blottière4, Nicolás Sánchez4, Valentin Loux4, Séverine Layec4, Yohanan Winogradsky4
1Center for Biological Sequence Analysis
2Novo Nordisk Foundation Center for Biosustainability
3Department of Computer Science [Baltimore]
4MICrobiologie de l'ALImentation au Service de la Santé
5BGI Hong Kong Researche Institute
6School of Bioscience and Biotechnology
7European Molecular Biology Laboratory
8MetaGenoPolis
9Genoscope - Centre national de séquençage [Evry]
10Université d'Évry-Val-d'Essonne
11Novo Nordisk Foundation Center for Basic Metabolic Research
12Digestive System Research Unit
13Beijing Genomics Institute [Shenzhen]
14Faculty of Health Sciences
15Department of Bioscience Engineering
16Department of Structural Biology
178National Food Institute - Division for Epidemiology and Microbial Genomics
18Department of Biology [Copenhagen]
19Faculty of Health
20Hagedorn Research Institute
21BGI Hong Kong research Institute
22Rega Institute - Department of Microbiology and Immunology
23VIB Center for the Biology of Disease
24Section of Microbiology [Copenhagen]
25Laboratory of Microbiology
26Department of Biological Information
27Max Delbrück Center for Molecular Medicine, Berlin
28Princess Al Jawhara Center of Excellence in the Research of Hereditary Disorders
29Centre for Host-Microbiome Interactions, Dental Institute Central Office, Guy’s Hospital

Tóm tắt

Từ khóa


Tài liệu tham khảo

Fodor, A.A. et al. The “most wanted” taxa from the human microbiome for whole genome sequencing. PLoS ONE 7, e41294 (2012).

Qin, J. et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 59–65 (2010).

Lukjancenko, O., Wassenaar, T.M. & Ussery, D.W. Comparison of 61 sequenced Escherichia coli genomes. Microb. Ecol. 60, 708–720 (2010).

Fitzsimons, M.S. et al. Nearly finished genomes produced using gel microdroplet culturing reveal substantial intraspecies genomic diversity within the human microbiome. Genome Res. 23, 878–888 (2013).

Pop, M. Genome assembly reborn: recent computational challenges. Brief. Bioinform. 10, 354–366 (2009).

Wooley, J.C., Godzik, A. & Friedberg, I. A primer on metagenomics. PLOS Comput. Biol. 6, e1000667 (2010).

Iverson, V. et al. Untangling genomes from metagenomes: revealing an uncultured class of marine Euryarchaeota. Science 335, 587–590 (2012).

Wang, Y., Leung, H.C.M., Yiu, S.M. & Chin, F.Y.L. MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample. Bioinformatics 28, i356–i362 (2012).

Albertsen, M. et al. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533–538 (2013).

Raes, J. & Bork, P. Molecular eco-systems biology: towards an understanding of community function. Nat. Rev. Microbiol. 6, 693–699 (2008).

Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012).

Reyes, A. et al. Viruses in the faecal microbiota of monozygotic twins and their mothers. Nature 466, 334–338 (2010).

Minot, S. et al. The human gut virome: inter-individual variation and dynamic response to diet. Genome Res. 21, 1616–1625 (2011).

Stern, A., Mick, E., Tirosh, I., Sagy, O. & Sorek, R. CRISPR targeting reveals a reservoir of common phages associated with the human gut microbiome. Genome Res. 22, 1985–1994 (2012).

Zhang, Q., Rho, M., Tang, H., Doak, T.G. & Ye, Y. CRISPR-Cas systems target a diverse collection of invasive mobile genetic elements in human microbiomes. Genome Biol. 14, R40 (2013).

Chain, P.S.G. et al. Genomics. Genome project standards in a new era of sequencing. Science 326, 236–237 (2009).

Le Chatelier, E. et al. Richness of human gut microbiome correlates with metabolic markers. Nature 500, 541–546 (2013).

Chervaux, C. et al. Genome sequence of the probiotic strain Bifidobacterium animalis subsp. lactis CNCM I-2494. J. Bacteriol. 193, 5560–5561 (2011).

Terns, M.P. & Terns, R.M. CRISPR-based adaptive immune systems. Curr. Opin. Microbiol. 14, 321–327 (2011).

Kruschke, J.K. Bayesian data analysis. Wiley Interdiscip. Rev. Cogn. Sci. 1, 658–676 (2010).

Karch, H. et al. The enemy within us: lessons from the 2011 European Escherichia coli O104:H4 outbreak. EMBO Mol. Med. 4, 841–848 (2012).

Kultima, J.R. et al. MOCAT: a metagenomics assembly and gene prediction toolkit. PLOS ONE 7, e47656 (2012).

Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010).

Zhu, W., Lomsadze, A. & Borodovsky, M. Ab initio gene identification in metagenomic sequences. Nucleic Acids Res. 38, e132 (2010).

Li, R. et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 1966–1967 (2009).

Leplae, R., Lima-Mendez, G. & Toussaint, A. ACLAME: a classification of mobile genetic elements, update 2010. Nucleic Acids Res. 38, D57–D61 (2010).

Finn, R.D., Clements, J. & Eddy, S.R. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39, W29–37 (2011).

Punta, M. et al. The Pfam protein families database. Nucleic Acids Res. 40, D290–D301 (2012).

Kristensen, D.M., Cai, X. & Mushegian, A. Evolutionarily conserved orthologous families in phages are relatively rare in their prokaryotic hosts. J. Bacteriol. 193, 1806–1814 (2011).

Powell, S. et al. eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges. Nucleic Acids Res. 40, D284–D289 (2012).

Tringe, S.G. et al. Comparative metagenomics of microbial communities. Science 308, 554–557 (2005).

Roessner, C.A. & Scott, A.I. Fine-tuning our knowledge of the anaerobic route to cobalamin (vitamin B12). J. Bacteriol. 188, 7331–7334 (2006).

Bland, C. et al. CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics 8, 209 (2007).

Zankari, E. et al. Identification of acquired antimicrobial resistance genes. J. Antimicrob. Chemother. 67, 2640–2644 (2012).

Kobayashi, K. et al. Essential Bacillus subtilis genes. Proc. Natl. Acad. Sci. USA 100, 4678–4683 (2003).

Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

Kelley, D.R., Schatz, M.C. & Salzberg, S.L. Quake: quality-aware detection and correction of sequencing errors. Genome Biol. 11, R116 (2010).

Zerbino, D.R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).

Mavromatis, K. et al. Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat. Methods 4, 495–500 (2007).

Earl, D. et al. Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Res. 21, 2224–2241 (2011).

Teeling, H., Meyerdierks, A., Bauer, M., Amann, R. & Glöckner, F.O. Application of tetranucleotide frequencies for the assignment of genomic fragments. Environ. Microbiol. 6, 938–947 (2004).

Salzberg, S.L. et al. GAGE: a critical evaluation of genome assemblies and assembly algorithms. Genome Res. 22, 557–567 (2012).

Koren, S., Treangen, T.J. & Pop, M. Bambus 2: scaffolding metagenomes. Bioinformatics 27, 2964–2971 (2011).

Ciccarelli, F.D. et al. Toward automatic reconstruction of a highly resolved tree of life. Science 311, 1283–1287 (2006).

Letunic, I. & Bork, P. Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy. Nucleic Acids Res. 39, W475–W478 (2011).

Treangen, T.J., Sommer, D.D., Angly, F.E., Koren, S. & Pop, M. Next generation sequence assembly with AMOS. Curr. Protoc. Bioinformatics Chapter 11, Unit 11.8 (2011).

Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).

Gelman, A., Jakulin, A., Pittau, M.G. & Su, Y. A weakly informative default prior distribution for logistic and other regression models. Ann. Appl. Stat. 2, 1360–1383 (2008).

Plummer, M. JAGS: a program for analysis of Bayesian graphical models using Gibbs sampling. in Proc. 3rd Int. Work. Distrib. Stat. Comput. March, 20–22 (2003).

Gelman, A. & Rubin, D. Inference from iterative simulation using multiple sequences. Stat. Sci. 7, 457–511 (1992).