Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges
Tóm tắt
Từ khóa
Tài liệu tham khảo
TP Niedringhaus, 2011, Landscape of next-generation sequencing technologies, Anal Chem, 83, 4327, 10.1021/ac2010857
KV Voelkerding, 2009, Next-generation sequencing: from basic research to diagnostics, Clin Chem, 55, 641, 10.1373/clinchem.2008.112789
M Helmy, 2012, Mass spectrum sequential subtraction speeds up searching large peptide MS/MS spectra datasets against large nucleotide databases for proteogenomics, Genes Cells, 17, 633, 10.1111/j.1365-2443.2012.01615.x
M Helmy, 2011, Peptide identification by searching large-scale tandem mass spectra against large databases: bioinformatics methods in proteogenomics, Genes, Genomes and Genomics, 6, 76
X Zhou, 2010, The next-generation sequencing technology and application, Protein Cell, 1, 520, 10.1007/s13238-010-0065-3
L Liu, 2012, Comparison of next-generation sequencing systems, J Biomed Biotechnol, 2012, 251364
J Butler, 2008, ALLPATHS: de novo assembly of whole-genome shotgun microreads, Genome Res, 18, 810, 10.1101/gr.7337908
M Chaisson, 2004, Fragment assembly with short reads, Bioinformatics, 20, 2067, 10.1093/bioinformatics/bth205
MJ Chaisson, 2009, De novo fragment assembly with short mate-paired reads: does the read length matter?, Genome Res, 19, 336, 10.1101/gr.079053.108
MJ Chaisson, 2008, Short read fragment assembly of bacterial genomes, Genome Res, 18, 324, 10.1101/gr.7088808
S DiGuistini, 2009, De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data, Genome Biol, 10, R94, 10.1186/gb-2009-10-9-r94
JC Dohm, 2007, SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing, Genome Res, 17, 1697, 10.1101/gr.6435207
G Gonnella, 2012, Readjoiner: a fast and memory efficient string graph-based sequence assembler, BMC Bioinformatics, 13, 82, 10.1186/1471-2105-13-82
D Hernandez, 2008, De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer, Genome Res, 18, 802, 10.1101/gr.072033.107
M Hossain, 2009, Crystallizing short-read assemblies around seeds, BMC Bioinformatics, 10, S16, 10.1186/1471-2105-10-S1-S16
S Koren, 2012, Hybrid error correction and de novo assembly of single-molecule sequencing reads, Nat Biotechnol, 30, 693, 10.1038/nbt.2280
R Li, 2010, De novo assembly of human genomes with massively parallel short read sequencing, Genome Res, 20, 265, 10.1101/gr.097261.109
I Maccallum, 2009, ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads, Genome Biol, 10, R103, 10.1186/gb-2009-10-10-r103
M Margulies, 2005, Genome sequencing in microfabricated high-density picolitre reactors, Nature, 437, 376, 10.1038/nature03959
JR Miller, 2008, Aggressive assembly of pyrosequencing reads with mates, Bioinformatics, 24, 2818, 10.1093/bioinformatics/btn548
JR Miller, 2010, Assembly algorithms for next-generation sequencing data, Genomics, 95, 315, 10.1016/j.ygeno.2010.03.001
EW Myers, 2000, A whole-genome assembly of Drosophila, Science, 287, 2196, 10.1126/science.287.5461.2196
K Paszkiewicz, 2010, De novo assembly of short sequence reads, Brief Bioinform, 11, 457, 10.1093/bib/bbq020
PA Pevzner, 2001, Fragment assembly with double-barreled data, Bioinformatics, 17, S225, 10.1093/bioinformatics/17.suppl_1.S225
PA Pevzner, 2001, An Eulerian path approach to DNA fragment assembly, Proc Natl Acad Sci U S A, 98, 9748, 10.1073/pnas.171285098
JA Reinhardt, 2009, De novo assembly using low-coverage short read sequence data from the rice pathogen Pseudomonas syringae pv. oryzae, Genome Res, 19, 294, 10.1101/gr.083311.108
B Schmidt, 2009, A fast hybrid short read fragment assembly algorithm, Bioinformatics, 25, 2279, 10.1093/bioinformatics/btp374
JT Simpson, 2012, Efficient de novo assembly of large genomes using compressed data structures, Genome Res, 22, 549, 10.1101/gr.126953.111
JT Simpson, 2009, ABySS: a parallel assembler for short read sequence data, Genome Res, 19, 1117, 10.1101/gr.089532.108
Y Wang, 2012, Optimizing hybrid assembly of next-generation sequence data from Enterococcus faecium: a microbe with highly divergent genome, BMC Syst Biol, 6, 1, 10.1186/1752-0509-6-S3-S21
RL Warren, 2006, Assembling millions of short DNA sequences using SSAKE, Bioinformatics, 23, 500, 10.1093/bioinformatics/btl629
C Ye, 2012, Exploiting sparseness in de novo genome assembly, BMC Bioinformatics, 13, S1, 10.1186/1471-2105-13-S6-S1
DR Zerbino, 2008, Velvet: algorithms for de novo short read assembly using de Bruijn graphs, Genome Res, 18, 821, 10.1101/gr.074492.107
MA Quail, 2012, A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers, BMC Genomics, 13, 341, 10.1186/1471-2164-13-341
JM Miller, 2012, Short reads, circular genome: skimming solid sequence to construct the bighorn sheep mitochondrial genome, J Hered, 103, 140, 10.1093/jhered/esr104
NJ Loman, 2012, Performance comparison of benchtop high-throughput sequencing platforms, Nat Biotechnol, 30, 434, 10.1038/nbt.2198
X Yang, 2013, A survey of error-correction methods for next-generation sequencing, Brief Bioinform, 14, 56, 10.1093/bib/bbs015
Sequence Read Archive. Available: <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/sra" xlink:type="simple">http://www.ncbi.nlm.nih.gov/sra</ext-link>. Accessed 4 February 2013.
Assembly Archive. Available: <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/Traces/assembly/" xlink:type="simple">http://www.ncbi.nlm.nih.gov/Traces/assembly/</ext-link>. Accessed 4 February 2013.
AGP file. Available: <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/projects/genome/assembly/agp/" xlink:type="simple">http://www.ncbi.nlm.nih.gov/projects/genome/assembly/agp/</ext-link>. Accessed 4 February 2013.
W Zhang, 2011, A practical comparison of <italic>de novo</italic> genome assembly software tools for next-generation sequencing technologies, PLoS ONE, 6, e17915, 10.1371/journal.pone.0017915
L Ilie, 2011, HiTEC: accurate error correction in high-throughput sequencing data, Bioinformatics, 27, 295, 10.1093/bioinformatics/btq653
WC Kao, 2011, ECHO: a reference-free short-read error correction algorithm, Genome Res, 21, 1181, 10.1101/gr.111351.110
DR Kelley, 2010, Quake: quality-aware detection and correction of sequencing errors, Genome Biol, 11, R116, 10.1186/gb-2010-11-11-r116
P Medvedev, 2011, Error correction of high-throughput sequencing datasets with non-uniform coverage, Bioinformatics, 27, i137, 10.1093/bioinformatics/btr208
L Salmela, 2011, Correcting errors in short reads by multiple alignments, Bioinformatics, 27, 1455, 10.1093/bioinformatics/btr170
J Schroder, 2009, SHREC: a short-read error correction method, Bioinformatics, 25, 2157, 10.1093/bioinformatics/btp379
X Yang, 2010, Reptile: representative tiling for short read error correction, Bioinformatics, 26, 2526, 10.1093/bioinformatics/btq468
M Boetzer, 2011, Scaffolding pre-assembled contigs using SSPACE, Bioinformatics, 24, 578, 10.1093/bioinformatics/btq683
A Dayarian, 2010, SOPRA: scaffolding algorithm for paired reads via statistical optimization, BMC Bioinformatics, 11, 345, 10.1186/1471-2105-11-345
N Donmez, 2013, SCARPA: scaffolding reads with practical algorithms, Bioinformatics, 29, 428, 10.1093/bioinformatics/bts716
S Gao, 2011, Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences, J Comput Biol, 18, 1681, 10.1089/cmb.2011.0170
S Koren, 2011, Bambus 2: scaffolding metagenomes, Bioinformatics, 27, 2964, 10.1093/bioinformatics/btr520
L Salmela, 2011, Fast scaffolding with small independent mixed integer programs, Bioinformatics, 27, 3259, 10.1093/bioinformatics/btr562
Z Li, 2012, Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph, Brief Funct Genomics, 11, 25, 10.1093/bfgp/elr035
S Gnerre, 2011, High-quality draft assemblies of mammalian genomes from massively parallel sequence data, Proc Natl Acad Sci U S A, 108, 1513, 10.1073/pnas.1017351108
H Li, 2012, Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly, Bioinformatics, 28, 1838, 10.1093/bioinformatics/bts280
Nagarajan N, Pop M (2010) Sequencing and genome assembly using next-generation technologies. In: Fenyö D, editor. Computational biology. Humana Press. pp. 1–17.
L Salmela, 2010, Correction of sequencing errors in a mixed set of reads, Bioinformatics, 26, 1284, 10.1093/bioinformatics/btq151
SL Salzberg, 2012, GAGE: a critical evaluation of genome assemblies and assembly algorithms, Genome Res, 22, 557, 10.1101/gr.131383.111
P Medvedev, 2009, Maximum likelihood genome assembly, J Comput Biol, 16, 1101, 10.1089/cmb.2009.0047
Medvedev P, Georgiou K, Myers G, Brudno M (2007) Computability of models for sequence assembly. In: Giancarlo R, Hannenhalli S, editors. Algorithms in bioinformatics. Springer Berlin Heidelberg. pp. 289–301.
H Peltola, 1984, SEQAID: a DNA sequence assembling program based on a mathematical model, Nucleic Acids Res, 12, 307, 10.1093/nar/12.1Part1.307
EW Myers, 2005, The fragment assembly string graph, Bioinformatics, 21, ii79, 10.1093/bioinformatics/bti1114
JT Simpson, 2010, Efficient construction of an assembly string graph using the FM-index, Bioinformatics, 26, i367, 10.1093/bioinformatics/btq217
RM Idury, 1995, A new algorithm for DNA sequence assembly, J Comput Biol, 2, 291, 10.1089/cmb.1995.2.291
A Charuvaka, 2011, Evaluation of short read metagenomic assembly, BMC Genomics, 12, S8, 10.1186/1471-2164-12-S2-S8
P Melsted, 2011, Efficient counting of k-mers in DNA sequences using a bloom filter, BMC Bioinformatics, 12, 1, 10.1186/1471-2105-12-333
TC Conway, 2011, Succinct data structures for assembling large genomes, Bioinformatics, 27, 479, 10.1093/bioinformatics/btq697
Bowe A, Onodera T, Sadakane K, Shibuya T (2012) Succinct de Bruijn Graphs. In: Raphael B, Tang J, editors. Algorithms in bioinformatics. Springer Berlin Heidelberg. pp. 225–235.
WR Jeck, 2007, Extending assembly of short DNA sequences to handle error, Bioinformatics, 23, 2942, 10.1093/bioinformatics/btm451
DW Bryant Jr, 2009, QSRA: a quality-value guided de novo short read assembler, BMC Bioinformatics, 10, 69, 10.1186/1471-2105-10-69
J-M Aury, 2008, High quality draft sequences for prokaryotic genomes using a mix of new sequencing technologies, BMC Genomics, 9, 1
LT Cerdeira, 2011, Rapid hybrid de novo assembly of a microbial genome using only short reads: Corynebacterium pseudotuberculosis I19 as a case study, J Microbiol Methods, 86, 218, 10.1016/j.mimet.2011.05.008
J Nijkamp, 2010, Integrating genome assemblies with MAIA, Bioinformatics, 26, i433, 10.1093/bioinformatics/btq366
DR Zerbino, 2009, Pebble and Rock Band: heuristic resolution of repeats and scaffolding in the velvet short-read <italic>de novo</italic> assembler, PLoS ONE, 4, e8407, 10.1371/journal.pone.0008407
DH Huson, 2002, The greedy path-merging algorithm for contig scaffolding, Journal of the ACM, 49, 603, 10.1145/585265.585267
P Medvedev, 2011, Paired de bruijn graphs: a novel approach for incorporating mate pair information into genome assemblers, J Comput Biol, 18, 1625, 10.1089/cmb.2011.0151
Medvedev P, Brudno M (2008) Ab initio whole genome shotgun assembly with mated short reads. Proceedings of the 12th annual international conference on research in computational molecular biology. Singapore: Springer-Verlag. pp. 50–64.
C Alkan, 2011, Limitations of next-generation genome sequence assembly, Nat Methods, 8, 61, 10.1038/nmeth.1527
G Golovko, 2012, Slim-Filter: an interactive windows-based application for illumina genome analyzer data assessment and manipulation, BMC Bioinformatics, 13, 166, 10.1186/1471-2105-13-166
DR Powell, 2013, VAGUE: a graphical user interface for the Velvet assembler, Bioinformatics, 29, 264, 10.1093/bioinformatics/bts664
DM Church, 2009, Lineage-specific biology revealed by a finished genome assembly of the mouse, PLoS Biol, 7, e1000112, 10.1371/journal.pbio.1000112
JK Colbourne, 2011, The ecoresponsive genome of Daphnia pulex, Science, 331, 555, 10.1126/science.1197761
R Li, 2010, The sequence and de novo assembly of the giant panda genome, Nature, 463, 311, 10.1038/nature08696
K Lindblad-Toh, 2005, Genome sequence, comparative analysis and haplotype structure of the domestic dog, Nature, 438, 803, 10.1038/nature04338
DP Locke, 2011, Comparative and demographic analysis of orang-utan genomes, Nature, 469, 529, 10.1038/nature09687
R Ming, 2008, The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus), Nature, 452, 991, 10.1038/nature06856
Y Lin, 2011, Comparative studies of de novo assembly tools for next-generation sequencing technologies, Bioinformatics, 27, 2031, 10.1093/bioinformatics/btr319
Huson DH, Halpern AL, Lai Z, Myers EW, Reinert K, <etal>et al</etal>.. (2001) Comparing assemblies using fragments and mate-pairs. Århus, Denmark: Springer Berlin Heidelberg. pp. 294–306
AM Phillippy, 2008, Genome assembly forensics: finding the elusive mis-assembly, Genome Biol, 9, R55, 10.1186/gb-2008-9-3-r55
S Zhou, 2007, Validation of rice genome sequence by optical mapping, BMC Genomics, 8, 278, 10.1186/1471-2164-8-278
G Parra, 2009, Assessing the gene space in draft genomes, Nucleic Acids Res, 37, 289, 10.1093/nar/gkn916
MJ Hubisz, 2011, Error and error mitigation in low-coverage genome assemblies, PLoS ONE, 6, e17034, 10.1371/journal.pone.0017034
S Meader, 2010, Genome assembly quality: assessment and improvement using the neutral indel model, Genome Res, 20, 675, 10.1101/gr.096966.109
D Earl, 2011, Assemblathon 1: a competitive assessment of de novo short read assembly methods, Genome Res, 21, 2224, 10.1101/gr.126599.111
N Haiminen, 2011, Evaluation of methods for <italic>de novo</italic> genome assembly from high-throughput sequencing reads reveals dependencies that affect the quality of the results, PLoS ONE, 6, e24182, 10.1371/journal.pone.0024182
G Narzisi, 2011, Comparing de novo genome assembly: the long and short of it, PLoS ONE, 6, e19175, 10.1371/journal.pone.0019175
F Vezzi, 2012, Feature-by-feature – evaluating <italic>de novo</italic> sequence assembly, PLoS ONE, 7, e31002, 10.1371/journal.pone.0031002
M Pop, 2009, Genome assembly reborn: recent computational challenges, Brief Bioinform, 10, 354, 10.1093/bib/bbp026
KR Bradnam, 2013, Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species, Gigascience, 2, 10, 10.1186/2047-217X-2-10
Sommerville I (1995) Software engineering (5th ed.). Addison Wesley Longman Publishing Co., Inc. 742 p.
J Goecks, 2012, NGS analyses by visualization with Trackster, Nat Biotech, 30, 1036, 10.1038/nbt.2404
H Li, 2009, The Sequence Alignment/Map format and SAMtools, Bioinformatics, 25, 2078, 10.1093/bioinformatics/btp352
SAM (Sequence Alignment/Map) format. Available: <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://samtools.sourceforge.net/" xlink:type="simple">http://samtools.sourceforge.net/</ext-link>. Accessed 16 August 2013.
FASTG - An expressive representation for genome assemblies. Available: <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://fastg.sourceforge.net/" xlink:type="simple">http://fastg.sourceforge.net/</ext-link>. Accessed 27 May 2013.