Toward almost closed genomes with GapFiller
Tóm tắt
De novo assembly is a commonly used application of next-generation sequencing experiments. The ultimate goal is to puzzle millions of reads into one complete genome, although draft assemblies usually result in a number of gapped scaffold sequences. In this paper we propose an automated strategy, called GapFiller, to reliably close gaps within scaffolds using paired reads. The method shows good results on both bacterial and eukaryotic datasets, allowing only few errors. As a consequence, the amount of additional wetlab work needed to close a genome is drastically reduced. The software is available at
http://www.baseclear.com/bioinformatics-tools/
.
Tài liệu tham khảo
Zerbino DR, Birney E: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2011, 18: 821-829.
Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I: ABySS: A parallel assembler for short read sequence data. Genome Res. 2009, 19: 1117-1123. 10.1101/gr.089532.108.
Li R, Fan W, Tian G, Zhu H, He L, Cai J, Huang Q, Cai Q, Li B, Bai Y, Zhang Z, Zhang Y, Wang W, Li J, Wei F, Li H, Jian M, Li J, Zhang Z, Nielsen R, Li D, Gu W, Yang Z, Xuan Z, Ryder OA, Leung FC, Zhou Y, Cao J, Sun X, Fu Y, et al: The sequence and de novo assembly of the giant panda genome. Nature. 2010, 463: 311-317. 10.1038/nature08696.
Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W: Scaffolding pre-assembled contigs using SSPACE. Bioinformatics. 2011, 27: 578-579. 10.1093/bioinformatics/btq683.
Dayarian A, Michael TP, Sengupta AM: SOPRA: Scaffolding algorithm for paired reads via statistical optimization. BMC Bioinformatics. 2010, 11: 345-10.1186/1471-2105-11-345.
Tsai IJ, Otto TD, Berriman M: Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps. Genome Biol. 2010, 11: R41-10.1186/gb-2010-11-4-r41.
GapFiller. [http://www.baseclear.com/bioinformatics-tools/]
Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009, 10: R25-10.1186/gb-2009-10-3-r25.
Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009, 25: 1754-1760. 10.1093/bioinformatics/btp324.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup: The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics. 2009, 25: 2078-2079. 10.1093/bioinformatics/btp352.
Kelley DR, Schatz MC, Salzberg SL: Quake: quality-aware detection and correction of sequencing errors. Genome Biol. 2010, 11: R116-10.1186/gb-2010-11-11-r116.
Salzberg SL, Phillippy AM, Zimin A, Puiu1 D, Magoc T, Koren S, Treangen TJ, Schatz MC, Delcher AL, Roberts M, Marçais G, Pop M, Yorke JA: GAGE: a critical evaluation of genome assemblies and assembly algorithms. Genome Res. 2012, 22: 557-567. 10.1101/gr.131383.111.
GAGE. [http://gage.cbcb.umd.edu/results/index.html]
MacCallum I, Przybylski D, Gnerre S, Burton J, Shlyakhter I, Gnirke A, Malek J, McKernan K, Ranade S, Shea TP, Williams L, Young S, Nusbaum C, Jaffe DB: ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads. Genome Biol. 2009, 10: R103-10.1186/gb-2009-10-10-r103.
IMAGE. [http://sourceforge.net/projects/image2/]
SOAPdenovo. [http://soap.genomics.org.cn/soapdenovo.html]
