Space-efficient and exact de Bruijn graph representation based on a Bloom filter

Rayan Chikhi1, Guillaume Rizk2
1Computer Science department, ENS Cachan / IRISA / INRIA, Rennes, 35042, France
2Algorizk, Paris, 75013, France

Tóm tắt

Từ khóa


Tài liệu tham khảo

Idury RM, Waterman MS: A new algorithm for DNA sequence assembly. J Comput Biol. 1995, 2 (2): 291-306.

Grabherr MG: Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotech. 2011, 29 (7): 644-652. 10.1038/nbt.1883. [ http://dx.doi.org/10.1038/nbt.1883 ], []

Peng Y, Leung HCM, Yiu SM, Chin FYL: Meta-IDBA: a de Novo assembler for metagenomic data. Bioinformatics. 2011, 27 (13): i94-i101.

Peterlongo P, Schnel N, Pisanti N, Sagot MF, Lacroix V: Identifying SNPs without a reference genome by comparing raw reads. String Processing and Information Retrieval. Berlin, Heidelberg: Springer,2010, 147-158.

Iqbal Z, Caccamo M, Turner I, Flicek P, McVean G: De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat Genet. 2012, 44: 226-232.

Sacomoto G, Kielbassa J, Chikhi R, Uricaru R, Antoniou P, Sagot M, Peterlongo P, Lacroix V: KISSPLICE: de-novo calling alternative splicing events from RNA-seq data. BMC Bioinformatics. 2012, 13 (Suppl 6): S5-[ http://www.biomedcentral.com/1471-2105/13/S6/S5 ].

Li R, Zhu H, Ruan J, Qian W, Fang X, Shi Z, Li Y, Li S, Shan G, Kristiansen K: De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 2010, 20 (2): 265.

Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJM, Birol I: ABySS: A parallel assembler for short read sequence data. Genome Res. 2009, 19 (6): [ http://genome.cshlp.org/content/19/6/1117.abstract ],1117-1123.

Conway TC, Bromage AJ: Succinct data structures for assembling large genomes. Bioinformatics. 2011, 27 (4): 479.

Warren RL, Holt RA: Targeted assembly of short sequence reads. PloS One. 2011, 6 (5): e19816.

Peterlongo P, Chikhi R: Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer. BMC Bioinformatics. 2012, 13: 48.

Ye C, Ma Z, Cannon C, Pop M, Yu D: Exploiting sparseness in de novo genome assembly. BMC Bioinformatics. 2012, 13 (Suppl 6): S1-[ http://www.biomedcentral.com/1471-2105/13/S6/S1 ],

Pell J, Hintze A, Canino-Koning R, Howe A, Tiedje JM, Brown CT: Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. Arxiv preprint arXiv:1112.4193. 2011.

Kirsch A, Mitzenmacher M: Less hashing, same performance: Building a better Bloom filter. Algorithms–ESA. 2006, 4168: 456-467.

Miller JR, Koren S, Sutton G: Assembly algorithms for next-generation sequencing data. Genomics. 2010, 95 (6): 315-327.

Chikhi R, Lavenier D: Localized genome assembly from reads to scaffolds: practical traversal of the paired string graph. Algo Bioinformatics. 2011, 6833: 39-48. 10.1007/978-3-642-23038-7_4.

Kingsford C, Schatz MC, Pop M: Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics. 2010, 11: 21.

Marçais G, Kingsford C: A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics.[ http://bioinformatics.oxfordjournals.org/content/27/6/764.abstract ],2011, 27 (6): 764-770.

Rizk G, Lavenier D, Chikhi R: DSK: k-mer counting with very low memory usage. Bioinformatics. 2013, 29 (5): 652-653.

Rizk G, Lavenier D: GASSST: global alignment short sequence search tool. Bioinformatics. 2010, 26 (20): 2534.

Salzberg SL, Phillippy AM, Zimin A, Puiu D, Magoc T, Koren S, Treangen TJ, Schatz MC, Delcher AL, Roberts M, Marçais G, Pop M, Yorke JA: GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res. 2012, 22 (3): 557-567. [ http://genome.cshlp.org/content/22/3/557.abstract ], []

Chazelle B, Kilian J, Rubinfeld R, Tal A: The Bloomier filter: an efficient data structure for static support lookup tables. Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms. 2004, 30-39. Philadelphia: SIAM

Bowe A, Onodera T, Sadakane K, Shibuya T: Succinct de Bruijn Graphs. Algorithms in Bioinformatics, Volume 7534 of Lecture Notes in Computer Science. Edited by: Raphael B, Tang J.[ http://dx.doi.org/10.1007/978-3-642-33122-0_18 ], Berlin, Heidelberg: Springer, 2012, 225-235.