Speeding up the detection of tandem repeats over the edit distance

Theoretical Computer Science - Tập 525 - Trang 103-110 - 2014
Dina Sokol1, Justin Tojeira2
1Department of Computer and Information Science, Brooklyn College of the City University of New York, 2900 Bedford Avenue, Brooklyn, NY 11210, United States
2Department of Computer Science, The Graduate Center of the City University of New York, 365 Fifth Avenue, New York, NY 10016, United States

Tài liệu tham khảo

Al-Hafeedh, 2012, A comparison of index-based lempel-ziv lz77 factorization algorithms, ACM Comput. Surv., 45, 5:1, 10.1145/2379776.2379781 Benson, 1999, Tandem repeats finder — a program to analyze DNA sequences, Nucleic Acids Res., 27, 573, 10.1093/nar/27.2.573 V. Boeva, V. Makeev, M. Régnier, SWAN: searching for highly divergent tandem repeats in DNA sequences and statistical significance, in: JOBIM’04, IEEE Computer Society, 2004, in: Proceedings JOBIM’04, Montréal. Chen, 2007, Fast and practical algorithms for computing all the runs in a string, vol. 4580, 307 Chen, 2008, Lempel-ziv factorization using less time & space, Math. Comput. Sci., 1, 605, 10.1007/s11786-007-0024-4 Crochemore, 2008, Computing longest previous factor in linear time and applications, Inform. Process. Lett., 106, 75, 10.1016/j.ipl.2007.10.006 Domaniç, 2007, A novel approach to the detection of genomic approximate tandem repeats in the levenshtein metric, J. Comput. Biol., 14, 873, 10.1089/cmb.2007.0018 Fischer, 2009, Faster entropy-bounded compressed suffix trees, Theoret. Comput. Sci., 410, 5354, 10.1016/j.tcs.2009.09.012 Gatchel, 2005, Diseases of unstable repeat expansion: Mechanisms and common principles, Nature Rev. Genet., 6, 743, 10.1038/nrg1691 Groult, 2004, Speeding up the detection of evolutive tandem repeats, Theoret. Comput. Sci., 310, 309, 10.1016/S0304-3975(03)00423-7 Gusfield, 1997 Jeffreys, 1993, DNA typing: approaches and applications, J. Forensic Sci. Soc., 3, 204, 10.1016/S0015-7368(93)73016-9 R. Kolpakov, G. Kucherov, Finding maximal repetitions in a word in linear time, in: Proc. of Symposium on Foundations of Computer Science (FOCS), 1999, pp. 596–604. Kolpakov, 2003, mreps: efficient and flexible detection of tandem repeats in DNA, Nucleic Acids Res., 31, 3672, 10.1093/nar/gkg617 Kolpakov, 2003, Finding approximate repetitions under hamming distance, Theoret. Comput. Sci., 1, 135, 10.1016/S0304-3975(02)00448-6 Kucherov, 2008, Approximate tandem repeats Landau, 1998, Incremental string comparison, SIAM J. Comput., 27, 557, 10.1137/S0097539794264810 Landau, 2001, An algorithm for approximate tandem repeats, J. Comput. Biol., 8, 1, 10.1089/106652701300099038 Landau, 1989, Fast parallel and serial approximate string matching, J. Algorithms, 10, 157, 10.1016/0196-6774(89)90010-2 Lempel, 1976, On the complexity of finite sequences, IEEE Trans. Inform. Theory, 22, 75, 10.1109/TIT.1976.1055501 Main, 1989, Detecting leftmost maximal periodicities, Discrete Appl. Math., 25, 145, 10.1016/0166-218X(89)90051-6 Main, 1984, An O(nlogn) algorithm for finding all repetitions in a string, J. Algorithms, 5, 422, 10.1016/0196-6774(84)90021-X Mirkin, 2008, DNA structures, repeat expansions and human hereditary disorders, Curr. Opin. Struct. Biol., 16, 351, 10.1016/j.sbi.2006.05.004 Parisi, 2003, STRING: finding tandem repeats in DNA sequences, Bioinformatics, 19, 1733, 10.1093/bioinformatics/btg268 Pellegrini, 2010, Trstalker: an efficient heuristic for finding fuzzy tandem repeats, Bioinformatics [ISMB], 26, 358, 10.1093/bioinformatics/btq209 Sadakane, 2003, New text indexing functionalities of the compressed suffix arrays, J. Algorithms, 48, 294, 10.1016/S0196-6774(03)00087-7 Sagot, 1998, Identifying satellites and periodic repetitions in biological sequences, J. Comput. Biol., 5, 539, 10.1089/cmb.1998.5.539 Sokol, 2007, Tandem repeats over the edit distance, Bioinformatics, 23, e30, 10.1093/bioinformatics/btl309 Spong, 2002, A near-extinction event in lynx: do microsatellite data tell the tale?, Conserv. Ecol., 6, 15, 10.5751/ES-00358-060115 Uform, 1993, Microsatellites and their application to population genetic studies, Curr. Opin. Genetics Dev., 3, 939, 10.1016/0959-437X(93)90017-J Ukkonen, 1983, On approximate string matching, vol. 158, 487 Usdin, 2008, The biological effects of simple tandem repeats: lessons from the repeat expansion diseases, Genome Res., 18, 1011, 10.1101/gr.070409.107 Wexler, 2004, Finding approximate tandem repeats in genomic sequences, 223 Ziv, 1977, A universal algorithm for sequential data compression, IEEE Trans. Inform. Theory, IT-23, 337, 10.1109/TIT.1977.1055714