A taxonomy of suffix array construction algorithms

ACM Computing Surveys - Tập 39 Số 2 - Trang 4 - 2007
Simon J. Puglisi1, W.F. Smyth2, Andrew Turpin3
1Curtin University of Technology, Melbourne, Australia
2McMaster University and Curtin University of Technology, Hamilton, ON, Canada
3RMIT University, Melbourne, Australia

Tóm tắt

In 1990, Manber and Myers proposed suffix arrays as a space-saving alternative to suffix trees and described the first algorithms for suffix array construction and use. Since that time, and especially in the last few years, suffix array construction algorithms have proliferated in bewildering abundance. This survey paper attempts to provide simple high-level descriptions of these numerous algorithms that highlight both their distinctive features and their commonalities, while avoiding as much as possible the complexities of implementation details. New hybrid algorithms are also described. We provide comparisons of the algorithms' worst-case time complexity and use of additional space, together with results of recent experimental test runs on many of their implementations.

Từ khóa


Tài liệu tham khảo

10.1016/S1570-8667(03)00065-0

Apostolico , A. 1985. The myriad virtues of subword trees . In Combinatorial Algorithms on Words. NATO ASI Series F12 . Springer-Verlag , Berlin, Germany , 85--96. Apostolico, A. 1985. The myriad virtues of subword trees. In Combinatorial Algorithms on Words. NATO ASI Series F12. Springer-Verlag, Berlin, Germany, 85--96.

10.1109/TC.2005.56

10.1002/spe.4380231105

Bentley , J. L. and Sedgewick , R . 1997. Fast algorithms for sorting and searching strings . In Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms ( New Orleans, LA). ACM, New York, 360--369. Bentley, J. L. and Sedgewick, R. 1997. Fast algorithms for sorting and searching strings. In Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms (New Orleans, LA). ACM, New York, 360--369.

Burkhardt , S. and Kärkkäinen , J . 2003. Fast lightweight suffix array construction and checking . In Proceedings of the 14th Annual Symposium CPM 2003, R. Baeza-Yates, E. Chávez, and M. Crochemore, Eds. Lecture Notes in Computer Science , vol. 2676 . Springer-Verlag, Berlin, Germany, 55--69. Burkhardt, S. and Kärkkäinen, J. 2003. Fast lightweight suffix array construction and checking. In Proceedings of the 14th Annual Symposium CPM 2003, R. Baeza-Yates, E. Chávez, and M. Crochemore, Eds. Lecture Notes in Computer Science, vol. 2676. Springer-Verlag, Berlin, Germany, 55--69.

Burrows , M. and Wheeler , D. J . 1994 . A block sorting lossless data compression algorithm. Tech. Rep. 124 , Digital Equipment Corporation, Palo Alto, CA . Burrows, M. and Wheeler, D. J. 1994. A block sorting lossless data compression algorithm. Tech. Rep. 124, Digital Equipment Corporation, Palo Alto, CA.

10.1007/s00453-001-0051-5

10.5555/795663.796326

10.1145/301970.301973

10.1137/S0097539702402354

Hart M. 1997. Project Gutenberg. http://www.gutenberg.net. Hart M. 1997. Project Gutenberg. http://www.gutenberg.net.

Hon , W. , Sadakane , K. , and Sung , W . 2003. Breaking a time-and-space barrier in constructing full-text indices . In Proceedings of the 44th IEEE Symposium on Foundations of Computer Science (FOCS'03) . IEEE Computer Society Press, Los Alamitos, CA, 251--260. Hon, W., Sadakane, K., and Sung, W. 2003. Breaking a time-and-space barrier in constructing full-text indices. In Proceedings of the 44th IEEE Symposium on Foundations of Computer Science (FOCS'03). IEEE Computer Society Press, Los Alamitos, CA, 251--260.

Itoh , H. and Tanaka , H . 1999. An efficient method for in memory construction of suffix arrays . In Proceedings of the 6th Symposium on String Processing and Information Retrieval ( Cancun, Mexico). IEEE Computer Society, Los Alamitos, CA, 81--88. Itoh, H. and Tanaka, H. 1999. An efficient method for in memory construction of suffix arrays. In Proceedings of the 6th Symposium on String Processing and Information Retrieval (Cancun, Mexico). IEEE Computer Society, Los Alamitos, CA, 81--88.

Kärkkäinen , J. and Sanders , P . 2003. Simple linear work suffix array construction . In Proceedings of the 30th International Colloquium Automata, Languages and Programming. Lecture Notes in Computer Science , vol. 2971 . Springer-Verlag, Berlin, Germany, 943--955. Kärkkäinen, J. and Sanders, P. 2003. Simple linear work suffix array construction. In Proceedings of the 30th International Colloquium Automata, Languages and Programming. Lecture Notes in Computer Science, vol. 2971. Springer-Verlag, Berlin, Germany, 943--955.

10.1145/1217856.1217858

10.1145/800152.804905

Kasai , T. , Lee , G. , Arimura , H. , Arikawa , S. , and Park , K . 2001. Linear-time longest-common-prefix computation in suffix arrays and its applications . In Proceedings of the 12th Annual Symposium (CPM 2001 ). Lecture Notes in Computer Science , vol. 2089 . Springer-Verlag, Berlin, Germany, 181--192. Kasai, T., Lee, G., Arimura, H., Arikawa, S., and Park, K. 2001. Linear-time longest-common-prefix computation in suffix arrays and its applications. In Proceedings of the 12th Annual Symposium (CPM 2001). Lecture Notes in Computer Science, vol. 2089. Springer-Verlag, Berlin, Germany, 181--192.

Khmelev D. V. 2003. Program suffsort version 0.1.6. http://www.math.toronto.edu/dkhmelev/PROGS/tacu/suffsort-eng.html. Khmelev D. V. 2003. Program suffsort version 0.1.6. http://www.math.toronto.edu/dkhmelev/PROGS/tacu/suffsort-eng.html.

10.1007/978-3-540-24838-5_23

Kim , D. K. , Sim , J. S. , Park , H. , and Park , K . 2003. Linear-time construction of suffix arrays . In Proceedings of the 14th Annual Symposium Combinatorial Pattern Matching, R. Baeza-Yates, E. Chávez, and M. Crochemore, Eds. Lecture Notes in Computer Science , vol. 2676 . Springer-Verlag, Berlin, Germany, 186--199. Kim, D. K., Sim, J. S., Park, H., and Park, K. 2003. Linear-time construction of suffix arrays. In Proceedings of the 14th Annual Symposium Combinatorial Pattern Matching, R. Baeza-Yates, E. Chávez, and M. Crochemore, Eds. Lecture Notes in Computer Science, vol. 2676. Springer-Verlag, Berlin, Germany, 186--199.

10.1016/j.jda.2004.08.019

Ko P. 2006. Linear time suffix array. http://www.public.iastate.edu/~kopang/progRelease/homepage.html. Ko P. 2006. Linear time suffix array. http://www.public.iastate.edu/~kopang/progRelease/homepage.html.

Ko , P. and Aluru , S . 2003. Space efficient linear time construction of suffix arrays . In Proceedings of the 14th Annual Symposium CPM 2003, R. Baeza-Yates, E. Chávez, and M. Crochemore, Eds. Lecture Notes in Computer Science , vol. 2676 . Springer-Verlag, Berlin, Germany, 200--210. Ko, P. and Aluru, S. 2003. Space efficient linear time construction of suffix arrays. In Proceedings of the 14th Annual Symposium CPM 2003, R. Baeza-Yates, E. Chávez, and M. Crochemore, Eds. Lecture Notes in Computer Science, vol. 2676. Springer-Verlag, Berlin, Germany, 200--210.

10.1016/j.jda.2004.08.002

10.1002/(SICI)1097-024X(199911)29:13%3C1149::AID-SPE274%3E3.0.CO;2-O

Larsson , J. N. and Sadakane , K . 1999 . Faster suffix sorting. Tech. Rep. LU-CS-TR:99-214 {LUNFD6/(NFCS-3140)/1-20/(1999)}, Department of Computer Science , Lund University , Sweden . Larsson, J. N. and Sadakane, K. 1999. Faster suffix sorting. Tech. Rep. LU-CS-TR:99-214 {LUNFD6/(NFCS-3140)/1-20/(1999)}, Department of Computer Science, Lund University, Sweden.

Lee , S. and Park , K . 2004. Efficient implementations of suffix array construction algorithms . In AWOCA 2004: Proceedings of the 15th Australasian Workshop on Combinatorial Algorithms, S. Hong, Ed. 64--72 . Lee, S. and Park, K. 2004. Efficient implementations of suffix array construction algorithms. In AWOCA 2004: Proceedings of the 15th Australasian Workshop on Combinatorial Algorithms, S. Hong, Ed. 64--72.

Malyshev D. 2006. DARK the universal archiver based on BWT-DC scheme. http://darchiver.narod.ru/. Malyshev D. 2006. DARK the universal archiver based on BWT-DC scheme. http://darchiver.narod.ru/.

Manber , U. and Myers , G. W . 1990. Suffix arrays: A new method for on-line string searches . In Proceedings of the 1st ACM-SIAM Symposium on Discrete Algorithms. ACM , New York, 319--327. Manber, U. and Myers, G. W. 1990. Suffix arrays: A new method for on-line string searches. In Proceedings of the 1st ACM-SIAM Symposium on Discrete Algorithms. ACM, New York, 319--327.

10.1137/0222058

Maniscalco M. A. 2005. MSufSort. http://www.michael-maniscalco.com/msufsort.htm. Maniscalco M. A. 2005. MSufSort. http://www.michael-maniscalco.com/msufsort.htm.

Maniscalco , M. A. and Puglisi , S. J . 2006. Faster lightweight suffix array construction . In Proceedings of 17th Australasian Workshop on Combinatorial Algorithms, J. Ryan and Dafik, Eds . Univ. Ballavat, Ballavat, Victoria, Australia, 16--29. Maniscalco, M. A. and Puglisi, S. J. 2006. Faster lightweight suffix array construction. In Proceedings of 17th Australasian Workshop on Combinatorial Algorithms, J. Ryan and Dafik, Eds. Univ. Ballavat, Ballavat, Victoria, Australia, 16--29.

10.1145/1227161.1278374

10.1007/978-3-540-27810-8_32

10.1007/s00453-004-1094-1

McIlroy M. D. 1997. ssort.c. http://cm.bell-labs.com/cm/cs/who/doug/source.html. McIlroy M. D. 1997. ssort.c. http://cm.bell-labs.com/cm/cs/who/doug/source.html.

McIlroy , P. M. , Bostic , K. , and McIlroy , M. D. 1993 . Engineering radix sort . Comput. Syst. 6 , 1, 5 -- 27 . McIlroy, P. M., Bostic, K., and McIlroy, M. D. 1993. Engineering radix sort. Comput. Syst. 6, 1, 5--27.

Mori Y. 2006. DivSufSort. http://www.homepage3.nifty.com/wpage/software/libdivsufsort.html. Mori Y. 2006. DivSufSort. http://www.homepage3.nifty.com/wpage/software/libdivsufsort.html.

Munro , J. I. 1996. Tables . In Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS) . Lecture Notes in Computer Science , vol. 1180 . Springer-Verlag , London, UK , 37--42. Munro, J. I. 1996. Tables. In Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS). Lecture Notes in Computer Science, vol. 1180. Springer-Verlag, London, UK, 37--42.

10.1007/11496656_6

10.1145/1216370.1216372

10.1109/DCC.2005.87

10.5555/874052.874842

Schürmann , K. and Stoye , J . 2005. An incomplex algorithm for fast suffix array construction . In Proceedings of the 7th Workshop on Algorithm Engineering and Experiments (ALENEX05) . SIAM, 77--85. Schürmann, K. and Stoye, J. 2005. An incomplex algorithm for fast suffix array construction. In Proceedings of the 7th Workshop on Algorithm Engineering and Experiments (ALENEX05). SIAM, 77--85.

Seward , J. 2000 . On the performance of BWT sorting algroithms . In DCC: Data Compression Conference. IEEE Computer Society Press , Los Alamitos, CA, 173--182. Seward, J. 2000. On the performance of BWT sorting algroithms. In DCC: Data Compression Conference. IEEE Computer Society Press, Los Alamitos, CA, 173--182.

Sim , J. S. , Kim , D. K. , Park , H. , and Park , K . 2003. Linear-time search in suffix arrays . In Proceedings of the 14th Australasian Workshop on Combinatorial Algorithms, M. Miller and K. Park, Eds. ( Seoul, Korea), 139--146. Sim, J. S., Kim, D. K., Park, H., and Park, K. 2003. Linear-time search in suffix arrays. In Proceedings of the 14th Australasian Workshop on Combinatorial Algorithms, M. Miller and K. Park, Eds. (Seoul, Korea), 139--146.

10.1145/1005813.1041517

Smyth , B. 2003. Computing Patterns in Strings . Pearson Addison-Wesley , Essex, England . Smyth, B. 2003. Computing Patterns in Strings. Pearson Addison-Wesley, Essex, England.