Finding the region of pseudo-periodic tandem repeats in biological sequences
Tóm tắt
The genomes of many species are dominated by short sequences repeated consecutively. It is estimated that over 10% of the human genome consists of tandemly repeated sequences. Finding repeated regions in long sequences is important in sequence analysis. We develop a software, LocRepeat, that finds regions of pseudo-periodic repeats in a long sequence. We use the definition of Li et al. [1] for the pseudo-periodic partition of a region and extend the algorithm that can select the repeated region from a given long sequence and give the pseudo-periodic partition of the region. LocRepeat is available at
http://www.cs.cityu.edu.hk/~lwang/software/LocRepeat
Tài liệu tham khảo
Li L, Jin R, Kok P, Wan H: Pseudo-periodic partitions of biological sequences. Bioinformatics. 2004, 20: 295-306.
Jaitly D, Kearney PE, Lin G, Ma B: Methods for reconstructing the history of tandem repeats and their application to the human genome. Journal of Computer and System Sciences. 2002, 65 (3): 494-507.
Tang M, Waterman M, Yooseph S: Zinc finger gene clusters and tandem gene duplication. Proceedings of the Fifth Annual International Conference on Computational Biology, April 22–25 2001. 2001, 297-304. Montreal, Canada, ACM
Landau GM, Schimidt JP: An algorithm for approximate tandem repeats. Proceedings of the Fourth Annual Symposium on Combinatorial Pattern Matching. 1993, 120-133. New York, LNCS 684, Springer-Verlag
Schmidt JP: All highest scoring paths in weighted grid graphs and their application to finding all repeats in strings. SIAM Journal on Computing. 1998, 27: 972-992.
Wan H, Song E: Quasiperiodic biosequences and modulo incidence matrices. Proceedings of the 16th International Parallel and Distributed Processing Symposium. 2002, 280-
Sim JS, Iliopoulos CS, Park K, Smyth WF: Approximate period of strings. Theoretical Computer Science. 2001, 262: 557-568.
Biou V, Yaremchuk A, Tykalo M, Cusack MS: The 2.9 Å crystal structure of T. thermophilus seryl-tRNA synthetase complexed with tRNA(Ser). Science. 1994, 263: 1404-1410.
Vaara M: Eight bacterial proteins, including UDP-N-acetylglucosamine acyltransferase (LpxA)and three other transferases of Escherichia coli, consist of a six-residue periodicity theme. FEMS Microbiology Letter. 1992, 76: 249-254.
Vuorio R, Harkonen T, Tolvanen M, Vaara M: The novel hexapeptide motif found in the acyltransferases LpxA and LpxD of lipid A biosynthesis is conserved in various bacteria. FEBS Letter. 1994, 337: 289-292.
Raetz CRH, Roderick SL: A left-handed parallel beta helix in the structure of UDP-N -acetylglucosamine acyltransferase. Science. 1995, 270: 997-1000.
Myers EW, Miller W: Optimal alignments in linear space. Computer Applications in the Biosciences. 1988, 4: 11-17.