Lossless filter for multiple repeats with bounded edit distance

Pierre Peterlongo1, Gustavo Akio Tominaga Sacomoto2, Alair Pereira do Lago3, Nadia Pisanti4, Marie‐France Sagot5,6,7
1Équipe-projet Symbiose, IRISA/CNRS, Campus de Beaulieu, Rennes, France
2Curso Experimental de Ciências Moleculares da Universidade de São Paulo, Brazil
3Instituto de Matemática e Estatística da Universidade de São Paulo, Brazil
4Dipartimento di Informatica Università di Pisa, Italy
5CNRS
6Univ. Lyon 1, Villeurbanne Cedex, France and Équipe-Projet BAMBOO, INRIA Rhône-Alpes, France
7Équipe BAOBAB, Laboratoire de Biométrie et Biologie Evolutive (UMR 5558)

Tóm tắt

Từ khóa


Tài liệu tham khảo

Lowe CB, Bejerano G, Haussler D: Thousands of human mobile element fragments undergo strong purifying selection near developmental genes. National Academy of Sciences. 2007, 104 (19): 8005-8010.

Ugarkovic D: Functional elements residing within satellite DNA. EMBO reports. 2005, 6: 1035-1039.

Rigoutsos I, Huynh T, Miranda K, Tsirigos A, McHardy A, Platt D: Short blocks from the noncoding parts of the human genome have instances within nearly all known genes and relate to biological processes. National Academy of Science. 2006, 103 (17): 6605-6610.

Burkhardt S, Crauser A, Ferragina P, Lenhof HP, Vingron M: q-Gram Based Database Searching Using a Suffix Array (QUASAR). Proceedings of the third annual international conference on Computational molecular biology (Recomb 99). 1999, ACM Press,

Pevzner P, Waterman M: Multiple Filtration and Approximate Pattern Matching. Algorithmica. 1995, 13: 135-154.

Rasmussen K, Stoye J, Myers E: Efficient q-gram Filters for finding all epsilon-matches over a given length. Journal of Computational Biology. 2006, 13 (2): 296-308.

Ukkonen E: Approximate String-Matching with q-Grams and Maximal Matches. Theoretical Computer Science. 1992, 92: 191-211.

Ukkonen E: Finding approximate patterns in strings. Journal of Algorithms. 1985, 6: 132-137.

Li M, Ma B: PatternHunter II: Highly Sensitive and Fast Homology Search. Genome Informatics. 2003, 14: 164-175.

Kucherov G, Noé L, Roytberg M: Multiseed Lossless Filtration. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2005, 02: 51-61.

Sun Y, Buhler J: Choosing the Best Heuristic for Seeded Alignment of DNA Sequences. BMC Bioinformatics. 2006, 7: 133-

Kolpakov R, Bana G, Kucherov G: MREPS: Efficient and Flexible Detection of Tandem Repeats in DNA. Nucleic Acid Research. 2003, 31 (13): 3672-3678.

Brudno M, Do CB, Cooper GM, Kim M, Davydov E, Green ED, Sidow A, Batzoglou S: LAGAN and Multi-LAGAN: Efficient Tools for Large-Scale Multiple Alignment of Genomic DNA. Genome Research. 2003, 13: 721-731.

Brudno M, Chapman M, Göttgens B, Batzoglou S, Morgenstern B: Fast and Sensitive Multiple Alignment of Large Genomic Sequences. BMC Bioinformatics. 2003, 4: 66-

Peterlongo P, Pisanti N, Boyer F, Sagot MF: Lossless Filter for Finding Long Multiple Approximate Repetitions Using a New Data Structure, the Bi-factor Array. String Processing and Information Retrieval (SPIRE 2005) 3772 of LNCS. 2005, 179-190.

Peterlongo P, Pisanti N, Boyer F, do Lago AP, Sagot MF: Lossless filter for multiple repetitions with Hamming distance. Journal of Discrete Algorithms. 2008, 6 (3): 497-509.

Hunt JW, Szymanski TG: A Fast Algorithm for Computing Longest Common Subsequences. CACM. 1977, 20 (5): 350-353.

Gusfield D: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. 1997, Cambridge University Press

Tettelin H: Complete Genome Sequence of Neisseria Meningitidis Serogroup B Strain MC58. Science. 2000, 287 (5459): 1809-1815.

Frith MC, Hansen U, Spouge JL, Weng Z: Finding functional sequence elements by multiple local alignment. Nucleic Acids Research. 2004, 32: 189-200.