Velvet: Algorithms for de novo short read assembly using de Bruijn graphs

Genome Research - Tập 18 Số 5 - Trang 821-829 - 2008

Daniel R. Zerbino¹, Ewan Birney¹

¹EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom

Tóm tắt

We have developed a new set of algorithms, collectively called “Velvet,” to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short words (k-mers) that is ideal for high coverage, very short read (25–50 bp) data sets. Applying Velvet to very short reads and paired-ends information only, one can produce contigs of significant length, up to 50-kb N50 length in simulations of prokaryotic data and 3-kb N50 on simulated mammalian BACs. When applied to real Solexa data sets without read pairs, Velvet generated contigs of ∼8 kb in a prokaryote and 2 kb in a mammalian BAC, in close agreement with our simulated results without read-pair information. Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies.

Từ khóa

Tài liệu tham khảo

Batzoglou, S. (2005) in Encyclopedia of genomics, proteomics and bioinformatics, Algorithmic challenges in mammalian genome sequence assembly, ed Dunn, M. (John Wiley and Sons, New York) Part 4.

10.1101/gr.208902

10.1016/j.gde.2006.10.009

10.1093/bioinformatics/bti129

10.1093/bioinformatics/bth205

10.1101/gr.6435207

Gross, J.L. Yellen, J. (2004) Handbook of graph theory (CRC Press LLC, Boca Raton, FL).

10.1101/gr.2264004

10.1101/gr.1390403

Idury,, 1995, A new algorithm for DNA sequence assembly, J. Comput. Biol., 2, 291, 10.1089/cmb.1995.2.291

10.1038/35057062

10.1093/bioinformatics/btm451

10.1038/ng.2007.9

10.1126/science.1141319

10.1038/nmeth726

10.1016/0888-7543(88)90007-9

Margulies,, 2005, Genome sequencing in microfabricated high-density picolitre reactors, Nature, 437, 376, 10.1038/nature03959

10.1101/gr.3770505

10.1101/gr.731003

10.1093/bioinformatics/bti1114

10.1126/science.287.5461.2196

10.1073/pnas.171285098

Shah, M.K. Lee, H. Rogers, S.A. Touchman, J.W. (2004) Computational Systems Bioinformatics Conference, An exhaustive genome assembly algorithm using k-mers to indirectly perform n-squared comparisons in O(n) (IEEE, New York), pp 740â741.

10.1186/1471-2105-6-31

10.1371/journal.pone.0000484

10.1126/science.1058040

Warren,, 2007, Assembling millions of short DNA sequences using SSAKE, Bioinformatics, 4, 500, 10.1093/bioinformatics/btl629

10.1038/nature01262

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA