SOAP2: an improved ultrafast tool for short read alignment

Bioinformatics - Tập 25 Số 15 - Trang 1966-1967 - 2009
Ruiqiang Li1, Chang Yu1, Yingrui Li1, Tak‐Wah Lam1, Siu‐Ming Yiu1, Karsten Kristiansen1, Jun Wang1
11 Beijing Genomics Institute at Shenzhen, Shenzhen, 518083, China, 2 Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense M, DK-5230, Denmark and 3 Department of Computer Science, University of Hong Kong, Hong Kong, China

Tóm tắt

Abstract Summary: SOAP2 is a significantly improved version of the short oligonucleotide alignment program that both reduces computer memory usage and increases alignment speed at an unprecedented rate. We used a Burrows Wheeler Transformation (BWT) compression index to substitute the seed strategy for indexing the reference sequence in the main memory. We tested it on the whole human genome and found that this new algorithm reduced memory usage from 14.7 to 5.4 GB and improved alignment speed by 20–30 times. SOAP2 is compatible with both single- and paired-end reads. Additionally, this tool now supports multiple text and compressed file formats. A consensus builder has also been developed for consensus assembly and SNP detection from alignment of short reads on a reference genome. Availability:  http://soap.genomics.org.cn Contact:  [email protected]

Từ khóa


Tài liệu tham khảo

Burrow, 1994, A block-sorting lossless data compression algorithm, Technical Report 124

Lam, 2008, Compressed indexing and local alignment of DNA, Bioinformatics, 24, 791, 10.1093/bioinformatics/btn032

Langmead, 2009, Ultrafast and memory-efficient alignment of short DNA sequences to the human genome, Genome Biol., 10, R25, 10.1186/gb-2009-10-3-r25

Li, 2008, Mapping short DNA sequencing reads and calling variants using mapping quality scores, Genome Res., 18, 1851, 10.1101/gr.078212.108

Li, 2008, SOAP: short oligonucleotide alignment program, Bioinformatics, 24, 713, 10.1093/bioinformatics/btn025

Wang, 2008, The diploid genome sequence of an Asian individual, Nature, 456, 60, 10.1038/nature07484