Using the Sadakane Compressed Suffix Tree to Solve the All-Pairs Suffix-Prefix Problem

BioMed Research International - Tập 2014 - Trang 1-11 - 2014
Maan Haj Rachid1, Qutaibah M. Malluhi1, Mohamed Abouelhoda2,3
1KINDI Lab for Computing Research, Qatar University P.O. Box 2713, Doha, Qatar
2Center for Informatics Sciences, Nile University, Giza, Egypt
3Faculty of Engineering, Cairo University, Giza, Egypt

Tóm tắt

The all-pairs suffix-prefix matching problem is a basic problem in string processing. It has an application in the de novo genome assembly task, which is one of the major bioinformatics problems. Due to the large size of the input data, it is crucial to use fast and space efficient solutions. In this paper, we present a space-economical solution to this problem using the generalized Sadakane compressed suffix tree. Furthermore, we present a parallel algorithm to provide more speed for shared memory computers. Our sequential and parallel algorithms are optimized by exploiting features of the Sadakane compressed index data structure. Experimental results show that our solution based on the Sadakane’s compressed index consumes significantly less space than the ones based on noncompressed data structures like the suffix tree and the enhanced suffix array. Our experimental results show that our parallel algorithm is efficient and scales well with increasing number of processors.

Từ khóa


Tài liệu tham khảo

10.1016/0020-0190(92)90176-V

1999, Software: Practice and Experience, 29, 1149

10.1016/j.ipl.2009.10.015

10.1016/S1570-8667(03)00065-0

10.1101/gr.126953.111

10.1007/978-3-540-30213-1_23

10.1007/s00224-006-1198-x

10.1007/BF01206331

2005, Nordic Journal of Computing, 12, 40

10.1006/jagm.2000.1151

10.1145/1498698.1594228

10.1093/bioinformatics/bti1114

10.1093/bioinformatics/btl681

2011