De novo assembly of human genomes with massively parallel short read sequencing

Genome Research - Tập 20 Số 2 - Trang 265-272 - 2010
Ruiqiang Li1,2, Hongmei Zhu1, Jue Ruan1, Wubin Qian1, Xiaodong Fang1, Zhongbin Shi1, Yingrui Li1, Shengting Li1, Shan Gao1, Karsten Kristiansen1,2, Songgang Li1, Huanming Yang1, Jian Wang1, Jun Wang1,2
1Beijing Genomics Institute at Shenzhen, Shenzhen 518083, China
2Department of Biology, University of Copenhagen, Copenhagen, DK-2200, Denmark

Tóm tắt

Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length sequences, making de novo assembly extremely challenging. Here, we describe a novel method for de novo assembly of large genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.

Từ khóa


Tài liệu tham khảo

10.1101/gr.208902

10.1016/j.gde.2006.10.009

10.1038/nature07517

10.1101/gr.7337908

10.1101/gr.7088808

10.1101/gr.6435207

10.1126/science.1150427

10.1101/gr.2264004

10.1101/gr.072033.107

10.1101/gr.1390403

10.1038/ng.475

10.1038/nature06258

10.1093/bioinformatics/btm451

10.1038/nature06862

10.1038/35057062

10.1093/bioinformatics/btn025

10.1038/nbt.1596

10.1101/gr.088013.108

10.1093/bioinformatics/btp336

10.1038/nature08696

10.1038/nature01262

10.1101/gr.731003

10.1126/science.287.5461.2196

10.1073/pnas.171285098

Shendure, 2004, Advanced sequencing technologies: Methods and goals, Nat Rev Genet, 5, 335, 10.1038/nrg1325

10.1101/gr.089532.108

10.1126/science.1058040

10.1101/gr.165102

10.1038/nature07484

10.1093/bioinformatics/btl629

10.1101/gr.074492.107