Block Graphs in Practice

Mathematics in Computer Science - Tập 11 Số 2 - Trang 191-196 - 2017
Gagie, Travis1, Hoobin, Christopher2, Puglisi, Simon J.3
1School of Computer Science and Telecommunications, Diego Portales University, Santiago, Chile
2Department of Computer Science, University of Helsinki, Helsinki, Finland
3School of CSIT, RMIT University, Melbourne, Australia

Tóm tắt

Motivated by the rapidly increasing size of genomic databases, code repositories and versioned texts, several compression schemes have been proposed that work well on highly-repetitive strings and also support fast random access: e.g., LZ-End, RLZ, GDC, augmented SLPs, and block graphs. Block graphs have good worst-case bounds but it has been an open question whether they are practical. We describe an implementation of block graphs that, for several standard datasets, provides better compression and faster random access than competing schemes.

Tài liệu tham khảo

Bille, P., Landau, G.M., Raman, R., Sadakane, K., Satti, S.R., Weimann, O.: Random access to grammar-compressed strings. In: Proceedings of the 22nd Symposium on Discrete Algorithms (SODA), pp. 373–389 (2011) citation_journal_title=IEEE Trans. Inf. Theory; citation_title=The smallest grammar problem; citation_author=M Charikar, E Lehman, D Liu, R Panigrahy, M Prabhakaran, A Sahai, A Shelat; citation_volume=51; citation_issue=7; citation_publication_date=2005; citation_pages=2554-2576; citation_doi=10.1109/TIT.2005.850116; citation_id=CR2 citation_journal_title=Bioinformatics; citation_title=Genome compression: a novel approach for large collections; citation_author=S Deorowicz, A Danek, S Grabowski; citation_volume=29; citation_issue=20; citation_publication_date=2013; citation_pages=2572-2578; citation_doi=10.1093/bioinformatics/btt460; citation_id=CR3 citation_journal_title=Bioinformatics; citation_title=Robust relative compression of genomes with random access; citation_author=S Deorowicz, S Grabowski; citation_volume=27; citation_issue=21; citation_publication_date=2011; citation_pages=2979-2986; citation_doi=10.1093/bioinformatics/btr505; citation_id=CR4 Gagie, T., Gawrychowski, P., Puglisi, S.J.: Faster approximate pattern matching in compressed repetitive texts. In: Proceedings of the 22nd International Symposium on Algorithms and Computation (ISAAC), pp. 653–662 (2011) citation_title=Random access to high-order entropy compressed text; citation_inbook_title=Pace-Efficient Data Structures, Streams, and Algorithms; citation_publication_date=2013; citation_pages=199-215; citation_id=CR6; citation_author=R Grossi; citation_publisher=Springer citation_journal_title=Theor. Comput. Sci.; citation_title=On compressing and indexing repetitive sequences; citation_author=S Kreft, G Navarro; citation_volume=483; citation_publication_date=2013; citation_pages=115-133; citation_doi=10.1016/j.tcs.2012.02.006; citation_id=CR7 Kuruppu, S., Puglisi, S.J., Zobel, J.: Relative Lempel-Ziv compression of genomes for large-scale storage and retrieval. In: Proceedings of the 17th Symposium on String Processing and Information Retrieval (SPIRE), pp. 201–206 (2010) Kuruppu, S., Puglisi, S.J., Zobel, J.: Optimized relative Lempel-Ziv compression of genomes. In: Proceedings of the 34th Australasian Computer Science Conference (ACSC), pp. 91–98 (2011) Maruyama, S., Tabei, Y., Sakamoto, H., Sadakane, K.: Fully-online grammar compression. In: Proceedings of the 20th Symposium on String Processing and Information Retrieval (SPIRE), pp. 218–229 (2013) Okanohara, D., Sadakane, K.: Practical entropy-compressed rank/select dictionary. In: Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX) (2007) citation_journal_title=ACM Trans. Algorithms; citation_title=Succinct indexable dictionaries with applications to encoding -ary trees, prefix sums and multisets; citation_author=R Raman, V Raman, SR Satti; citation_volume=3; citation_issue=4; citation_publication_date=2007; citation_pages=43; citation_doi=10.1145/1290672.1290680; citation_id=CR12 citation_journal_title=Theor. Comput. Sci.; citation_title=Application of Lempel-Ziv factorization to the approximation of grammar-based compression; citation_author=W Rytter; citation_volume=302; citation_issue=1–3; citation_publication_date=2003; citation_pages=211-222; citation_doi=10.1016/S0304-3975(02)00777-6; citation_id=CR13 Verbin, E., Yu, W.: Data structure lower bounds on random access to grammar-compressed strings. In: Proceedings of the 24th Symposium on Combinatorial Pattern Matching (CPM), pp. 247–258 (2013)