Fast and accurate short read alignment with Burrows–Wheeler transform

Bioinformatics - Tập 25 Số 14 - Trang 1754-1760 - 2009

Heng Li¹, Richard Durbin¹

¹Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA, UK

Tóm tắt

Abstract Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ∼10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

Từ khóa

Tài liệu tham khảo

Burrows, 1994, A block-sorting lossless data compression algorithm, Technical report 124

Campagna, 2009, PASS: a program to align short sequences, Bioinformatics, 25, 967, 10.1093/bioinformatics/btp087

Eaves, 2009, MOM: maximum oligonucleotide mapping, Bioinformatics, 25, 969, 10.1093/bioinformatics/btp092

Ferragina, 2000, Opportunistic data structures with applications, Proceedings of the 41st Symposium on Foundations of Computer Science (FOCS 2000), 390

Grossi, 2000, Compressed suffix arrays and suffix trees with applications to text indexing and string matching, Proceedings on 32nd Annual ACM Symposium on Theory of Computing (STOC 2000), 397

Hon, 2007, A space and time efficient algorithm for constructing compressed suffix arrays, Algorithmica, 48, 23, 10.1007/s00453-006-1228-8

Jiang, 2008, SeqMap: mapping massive amount of oligonucleotides to the genome, Bioinformatics, 24, 2395, 10.1093/bioinformatics/btn429

Jung Kim, 2009, ProbeMatch: a tool for aligning oligonucleotide sequences, Bioinformatics, 25, 1424, 10.1093/bioinformatics/btp178

Lam, 2008, Compressed indexing and local alignment of DNA, Bioinformatics, 24, 791, 10.1093/bioinformatics/btn032

Langmead, 2009, Ultrafast and memory-efficient alignment of short DNA sequences to the human genome, Genome Biol., 10, R25, 10.1186/gb-2009-10-3-r25

Lin, 2008, ZOOM! Zillions of oligos mapped, Bioinformatics, 24, 2431, 10.1093/bioinformatics/btn416

Lippert, 2005, Space-efficient whole genome comparisons with Burrows-Wheeler transforms, J. Comput. Biol., 12, 407, 10.1089/cmb.2005.12.407

Li, 2008, Mapping short DNA sequencing reads and calling variants using mapping quality scores, Genome Res., 18, 1851, 10.1101/gr.078212.108

Li, 2008, SOAP: short oligonucleotide alignment program, Bioinformatics, 24, 713, 10.1093/bioinformatics/btn025

Malhis, 2009, Slider–maximum use of probability information for alignment of short sequence reads and SNP detection, Bioinformatics, 25, 6, 10.1093/bioinformatics/btn565

Schatz, 2009, Cloudburst: highly sensitive read mapping with mapreduce, Bioinformatics, 25, 1363, 10.1093/bioinformatics/btp236

Smith, 2008, Using quality scores and longer reads improves accuracy of Solexa read mapping, BMC Bioinformatics, 9, 128, 10.1186/1471-2105-9-128

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA