Assembly algorithms for next-generation sequencing data

Genomics - Tập 95 Số 6 - Trang 315-327 - 2010
Jason Miller1, Sergey Koren1, Granger Sutton1
1J. Craig Venter Institute, 9704 Medical Center Drive, Rockville MD 20850-3343, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

Sanger, 1980, Cloning in single-stranded bacteriophage as an aid to rapid DNA sequencing, J. Mol. Biol., 143, 161, 10.1016/0022-2836(80)90196-5

Staden, 1979, A strategy of DNA sequencing employing computer programs, Nucleic Acids Res., 6, 2601, 10.1093/nar/6.7.2601

Pop, 2009, Genome assembly reborn: recent computational challenges, Brief. Bioinform., 10, 354, 10.1093/bib/bbp026

Mardis, 2008, The impact of next-generation sequencing technology on genetics, Trends Genet., 24, 133, 10.1016/j.tig.2007.12.007

Morozova, 2008, Applications of next-generation sequencing technologies in functional genomics, Genomics, 92, 255, 10.1016/j.ygeno.2008.07.001

Strausberg, 2008, Emerging DNA sequencing technologies for human genomic medicine, Drug Discov. Today, 13, 569, 10.1016/j.drudis.2008.03.025

Pettersson, 2009, Generations of sequencing technologies, Genomics, 93, 105, 10.1016/j.ygeno.2008.10.003

Sanger, 1977, DNA sequencing with chain-terminating inhibitors, Proc. Natl. Acad. Sci. U. S. A., 74, 5463, 10.1073/pnas.74.12.5463

Eid, 2009, Real-time DNA sequencing from single polymerase molecules, Science, 323, 133, 10.1126/science.1162986

Ewing, 1998, Base-calling of automated sequencer traces using phred. II. Error probabilities, Genome Res., 8, 186, 10.1101/gr.8.3.175

Huse, 2007, Accuracy and quality of massively parallel DNA pyrosequencing, Genome Biol., 8, R143, 10.1186/gb-2007-8-7-r143

Dohm, 2008, Substantial biases in ultra-short read data sets from high-throughput DNA sequencing, Nucleic Acids Res., 36, e105, 10.1093/nar/gkn425

Harismendy, 2009, Evaluation of next generation sequencing platforms for population targeted sequencing studies, Genome Biol., 10, R32, 10.1186/gb-2009-10-3-r32

Fleischmann, 1995, Whole-genome random sequencing and assembly of Haemophilus influenzae Rd, Science, 269, 496, 10.1126/science.7542800

Adams, 2000, The genome sequence of Drosophila melanogaster, Science, 287, 2185, 10.1126/science.287.5461.2185

Siegel, 2000, Modeling the feasibility of whole genome shotgun sequencing using a pairwise end strategy, Genomics, 68, 237, 10.1006/geno.2000.6303

Phillippy, 2008, Genome assembly forensics: finding the elusive mis-assembly, Genome Biol., 9, R55, 10.1186/gb-2008-9-3-r55

Kececioglu, 2001, Separating repeats in DNA sequence assembly, 176

Whiteford, 2005, An analysis of the feasibility of short read sequencing, Nucleic Acids Res., 33, e171, 10.1093/nar/gni170

Rusch, 2007, The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific, PLoS Biol., 5, e77, 10.1371/journal.pbio.0050077

Mavromatis, 2007, Use of simulated data sets to evaluate the fidelity of metagenomic processing methods, Nat. Methods, 4, 495, 10.1038/nmeth1043

Wommack, 2008, Metagenomics: read length matters, Appl. Environ. Microbiol., 74, 1453, 10.1128/AEM.02181-07

Myers, 1995, Toward simplifying and accurately formulating fragment assembly, J. Comput. Biol., 2, 275, 10.1089/cmb.1995.2.275

Idury, 1995, A new algorithm for DNA sequence assembly, J. Comput. Biol., 2, 291, 10.1089/cmb.1995.2.291

Zerbino, 2008, Velvet: algorithms for de novo short read assembly using de Bruijn graphs, Genome Res., 18, 821, 10.1101/gr.074492.107

Pevzner, 2004, De novo repeat classification and fragment assembly, Genome Res., 14, 1786, 10.1101/gr.2395204

Zhi, 2006, Identifying repeat domains in large genomes, Genome Biol., 7, R7, 10.1186/gb-2006-7-1-r7

Fasulo, 2002, Efficiently detecting polymorphisms during the fragment assembly process, Bioinformatics, 18, S294, 10.1093/bioinformatics/18.suppl_1.S294

Nagarajan, 2009, Parametric complexity of sequence assembly: theory and applications to next generation sequencing, J. Comput. Biol., 16, 897, 10.1089/cmb.2009.0005

Pop, 2008, Bioinformatics challenges of new sequencing technology, Trends Genet., 24, 142, 10.1016/j.tig.2007.12.006

Warren, 2007, Assembling millions of short DNA sequences using SSAKE, Bioinformatics, 23, 500, 10.1093/bioinformatics/btl629

Warren, 2008, SSAKE 3.0: Improved speed, accuracy and contiguity

Dohm, 2007, SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing, Genome Res., 17, 1697, 10.1101/gr.6435207

Jeck, 2007, Extending assembly of short DNA sequences to handle error, Bioinformatics, 23, 2942, 10.1093/bioinformatics/btm451

Reinhardt, 2009, De novo assembly using low-coverage short read sequence data from the rice pathogen Pseudomonas syringae pv. oryzae, Genome Res., 19, 294, 10.1101/gr.083311.108

Goldberg, 2006, A Sanger/pyrosequencing hybrid approach for the generation of high-quality draft assemblies of marine microbial genomes, Proc. Natl. Acad. Sci. U. S. A., 103, 11240, 10.1073/pnas.0604351103

Myers, 2000, A whole-genome assembly of Drosophila, Science, 287, 2196, 10.1126/science.287.5461.2196

Batzoglou, 2002, ARACHNE: a whole-genome shotgun assembler, Genome Res., 12, 177, 10.1101/gr.208902

Jaffe, 2003, Whole-genome sequence assembly for mammalian genomes: Arachne 2, Genome Res., 13, 91, 10.1101/gr.828403

X. Huang, S.P. Yang, Generating a genome assembly with PCAP. Curr Protoc Bioinformatics Chapter 11 (2005) Unit11 3.

Batzoglou, 2005, Algorithmic Challenges in Mammalian Genome Sequence Assembly

Pop, 2005, DNA sequence assembly algorithms

Sutton, 2007, Shotgun Fragment Assembly, 79

Wang, 1994, On the complexity of multiple sequence alignment, J. Comput. Biol., 1, 337, 10.1089/cmb.1994.1.337

Margulies, 2005, Genome sequencing in microfabricated high-density picolitre reactors, Nature, 437, 376, 10.1038/nature03959

Miller, 2008, Aggressive assembly of pyrosequencing reads with mates, Bioinformatics, 24, 2818, 10.1093/bioinformatics/btn548

Hernandez, 2008, De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer, Genome Res., 18, 802, 10.1101/gr.072033.107

Hossain, 2009, Crystallizing short-read assemblies around seeds, BMC Bioinformatics, 10, S16, 10.1186/1471-2105-10-S1-S16

Pevzner, 1989, 1-Tuple DNA sequencing: computer analysis, J. Biomol. Struct. Dyn., 7, 63, 10.1080/07391102.1989.10507752

Pevzner, 2001, An Eulerian path approach to DNA fragment assembly, Proc. Natl. Acad. Sci. U. S. A., 98, 9748, 10.1073/pnas.171285098

Simpson, 2009, ABySS: A parallel assembler for short read sequence data, Genome Res., 19, 1117, 10.1101/gr.089532.108

Pevzner, 2001, Fragment assembly with double-barreled data, Bioinformatics, 17, S225, 10.1093/bioinformatics/17.suppl_1.S225

Chaisson, 2004, Fragment assembly with short reads, Bioinformatics, 20, 2067, 10.1093/bioinformatics/bth205

Chaisson, 2008, Short read fragment assembly of bacterial genomes, Genome Res., 18, 324, 10.1101/gr.7088808

Chaisson, 2009, De novo fragment assembly with short mate-paired reads: Does the read length matter?, Genome Res., 19, 336, 10.1101/gr.079053.108

Zerbino, 2009, Pebble and rock band: heuristic resolution of repeats and scaffolding in the velvet short-read de novo assembler, PLoS One, 4, e8407, 10.1371/journal.pone.0008407

Butler, 2008, ALLPATHS: de novo assembly of whole-genome shotgun microreads, Genome Res., 18, 810, 10.1101/gr.7337908

Maccallum, 2009, ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads, Genome Biol., 10, R103, 10.1186/gb-2009-10-10-r103

Li, 2009, De novo assembly of human genomes with massively parallel short read sequencing, Genome Res., 20, 265, 10.1101/gr.097261.109

Li, 2009, The sequence and de novo assembly of the giant panda genome, Nature, 463, 311, 10.1038/nature08696

Li, 2009, Building the sequence map of the human pan-genome, Nat. Biotechnol., 28, 57, 10.1038/nbt.1596

Venter, 2001, The sequence of the human genome, Science, 291, 1304, 10.1126/science.1058040

Diguistini, 2009, De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data, Genome Biol., 10, R94, 10.1186/gb-2009-10-9-r94

Schmidt, 2009, A fast hybrid short read fragment assembly algorithm, Bioinformatics, 25, 2279, 10.1093/bioinformatics/btp374

Sundquist, 2007, Whole-genome sequencing and assembly with high-throughput, short-read technologies, PLoS ONE, 2, e484, 10.1371/journal.pone.0000484

Myers, 2005, The fragment assembly string graph, Bioinformatics, 21, ii79, 10.1093/bioinformatics/bti1114

P. Medvedev, M. Brudno, Ab initio Whole Genome Shotgun Assembly with Mated Short Reads Proceedings of the 12th Annual Research in Computational Biology Conference (RECOMB), 2008.

Li, 2008, SOAP: short oligonucleotide alignment program, Bioinformatics, 24, 713, 10.1093/bioinformatics/btn025

Li, 2009, SOAP2: an improved ultrafast tool for short read alignment, Bioinformatics, 25, 1966, 10.1093/bioinformatics/btp336

Li, 2008, Mapping short DNA sequencing reads and calling variants using mapping quality scores, Genome Res., 18, 1851, 10.1101/gr.078212.108

Langmead, 2009, Ultrafast and memory-efficient alignment of short DNA sequences to the human genome, Genome Biol., 10, R25, 10.1186/gb-2009-10-3-r25

Smith, 2008, Using quality scores and longer reads improves accuracy of Solexa read mapping, BMC Bioinformatics, 9, 128, 10.1186/1471-2105-9-128

Schatz, 2009, CloudBurst: Highly Sensitive Read Mapping with MapReduce, Bioinformatics, 25, 1363, 10.1093/bioinformatics/btp236

Rumble, 2009, SHRiMP: accurate mapping of short color-space reads, PLoS Comput. Biol., 5, e1000386, 10.1371/journal.pcbi.1000386

Weese, 2009, RazerS–fast read mapping with sensitivity control, Genome Res., 19, 1646, 10.1101/gr.088823.108

Chen, 2009, PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds, Bioinformatics, 25, 2514, 10.1093/bioinformatics/btp486

Hoffmann, 2009, Fast mapping of short sequences with mismatches, insertions and deletions using index structures, PLoS Comput. Biol., 5, e1000502, 10.1371/journal.pcbi.1000502

Schneeberger, 2009, Simultaneous alignment of short reads against multiple genomes, Genome Biol., 10, R98, 10.1186/gb-2009-10-9-r98

Zhao, 2009, BOAT: Basic Oligonucleotide Alignment Tool, BMC Genomics, 10, S2, 10.1186/1471-2164-10-S3-S2

McKernan, 2009, Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding, Genome Res., 19, 1527, 10.1101/gr.091868.109

Lin, 2008, ZOOM! Zillions of oligos mapped, Bioinformatics, 24, 2431, 10.1093/bioinformatics/btn416

Lee, 2009, MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions, Nat. Methods, 6, 473, 10.1038/nmeth.f.256

Hormozdiari, 2009, Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes, Genome Res., 19, 1270, 10.1101/gr.088633.108

Chen, 2009, BreakDancer: an algorithm for high-resolution mapping of genomic structural variation, Nat. Methods, 6, 677, 10.1038/nmeth.1363

Pop, 2004, Comparative genome assembly, Brief. Bioinform., 5, 237, 10.1093/bib/5.3.237

Hillier, 2008, Whole-genome sequencing and variant discovery in C. elegans, Nat. Methods, 5, 183, 10.1038/nmeth.1179

Salzberg, 2008, Genome sequence and rapid evolution of the rice pathogen Xanthomonas oryzae pv. oryzae PXO99A, BMC Genomics, 9, 204, 10.1186/1471-2164-9-204