MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph

Bioinformatics (Oxford, England) - Tập 31 Số 10 - Trang 1674-1676 - 2015
Dinghua Li1, Chi-Man Liu1, Ruibang Luo1, Kunihiko Sadakane1, Tak‐Wah Lam1
11 HKU-BGI Bioinformatics Algorithms Research Laboratory & Department of Computer Science, University of Hong Kong, Hong Kong, 2L3 Bioinformatics Limited, Hong Kong and 3National Institute of Informatics, Chiyoda-ku, Tokyo, Japan

Tóm tắt

Abstract Summary: MEGAHIT is a NGS de novo assembler for assembling large and complex metagenomics data in a time- and cost-efficient manner. It finished assembling a soil metagenomics dataset with 252 Gbps in 44.1 and 99.6 h on a single computing node with and without a graphics processing unit, respectively. MEGAHIT assembles the data as a whole, i.e. no pre-processing like partitioning and normalization was needed. When compared with previous methods on assembling the soil data, MEGAHIT generated a three-time larger assembly, with longer contig N50 and average contig length; furthermore, 55.8% of the reads were aligned to the assembly, giving a fourfold improvement. Availability and implementation: The source code of MEGAHIT is freely available at https://github.com/voutcn/megahit under GPLv3 license. Contact:  [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Từ khóa


Tài liệu tham khảo

Bankevich, 2012, SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing, J. Comput. Biol., 19, 455, 10.1089/cmb.2012.0021

Bowe, 2012, Succinct de Bruijn Graphs, Algorithms in Bioinformatics, 225, 10.1007/978-3-642-33122-0_18

Chikhi, 2012, Space-efficient and exact de Bruijn graph representation based on a bloom filter, Algorithms in Bioinformatics, 236, 10.1007/978-3-642-33122-0_19

Gurevich, 2013, QUAST: quality assessment tool for genome assemblies, Bioinformatics, 29, 1072, 10.1093/bioinformatics/btt086

Howe, 2014, Tackling soil diversity with the assembly of large, complex metagenomes, Proc. Natl Acad. Sci. USA, 111, 4904, 10.1073/pnas.1402564111

Langmead, 2012, Fast gapped-read alignment with Bowtie 2, Nat. Methods, 9, 357, 10.1038/nmeth.1923

Liu, 2014, GPU-accelerated BWT construction for large collection of short reads, arXiv

Luo, 2012, SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler, GigaScience, 1, 18, 10.1186/2047-217X-1-18

Peng, 2012, IDBA-UD: a de novo assembler for single-cell and meta-genomic sequencing data with highly uneven depth, Bioinformatics, 28, 1420, 10.1093/bioinformatics/bts174

Qin, 2010, A human gut microbial gene catalogue established by metagenomic sequencing, Nature, 464, 59, 10.1038/nature08821