Finding the region of pseudo-periodic tandem repeats in biological sequences

Springer Science and Business Media LLC - Tập 1 - Trang 1-8 - 2006
Xiaowen Liu1, Lusheng Wang1
1Department of Computer Science, City University of Hong Kong, Hong Kong

Tóm tắt

The genomes of many species are dominated by short sequences repeated consecutively. It is estimated that over 10% of the human genome consists of tandemly repeated sequences. Finding repeated regions in long sequences is important in sequence analysis. We develop a software, LocRepeat, that finds regions of pseudo-periodic repeats in a long sequence. We use the definition of Li et al. [1] for the pseudo-periodic partition of a region and extend the algorithm that can select the repeated region from a given long sequence and give the pseudo-periodic partition of the region. LocRepeat is available at http://www.cs.cityu.edu.hk/~lwang/software/LocRepeat

Tài liệu tham khảo

Li L, Jin R, Kok P, Wan H: Pseudo-periodic partitions of biological sequences. Bioinformatics. 2004, 20: 295-306. Jaitly D, Kearney PE, Lin G, Ma B: Methods for reconstructing the history of tandem repeats and their application to the human genome. Journal of Computer and System Sciences. 2002, 65 (3): 494-507. Tang M, Waterman M, Yooseph S: Zinc finger gene clusters and tandem gene duplication. Proceedings of the Fifth Annual International Conference on Computational Biology, April 22–25 2001. 2001, 297-304. Montreal, Canada, ACM Landau GM, Schimidt JP: An algorithm for approximate tandem repeats. Proceedings of the Fourth Annual Symposium on Combinatorial Pattern Matching. 1993, 120-133. New York, LNCS 684, Springer-Verlag Schmidt JP: All highest scoring paths in weighted grid graphs and their application to finding all repeats in strings. SIAM Journal on Computing. 1998, 27: 972-992. Wan H, Song E: Quasiperiodic biosequences and modulo incidence matrices. Proceedings of the 16th International Parallel and Distributed Processing Symposium. 2002, 280- Sim JS, Iliopoulos CS, Park K, Smyth WF: Approximate period of strings. Theoretical Computer Science. 2001, 262: 557-568. Biou V, Yaremchuk A, Tykalo M, Cusack MS: The 2.9 Å crystal structure of T. thermophilus seryl-tRNA synthetase complexed with tRNA(Ser). Science. 1994, 263: 1404-1410. Vaara M: Eight bacterial proteins, including UDP-N-acetylglucosamine acyltransferase (LpxA)and three other transferases of Escherichia coli, consist of a six-residue periodicity theme. FEMS Microbiology Letter. 1992, 76: 249-254. Vuorio R, Harkonen T, Tolvanen M, Vaara M: The novel hexapeptide motif found in the acyltransferases LpxA and LpxD of lipid A biosynthesis is conserved in various bacteria. FEBS Letter. 1994, 337: 289-292. Raetz CRH, Roderick SL: A left-handed parallel beta helix in the structure of UDP-N -acetylglucosamine acyltransferase. Science. 1995, 270: 997-1000. Myers EW, Miller W: Optimal alignments in linear space. Computer Applications in the Biosciences. 1988, 4: 11-17.