Constructing Sequence Alignments from a Markov Decision Model with Estimated Parameter Values
Tóm tắt
Current methods for aligning biological sequences are based on dynamic programming algorithms. If large numbers of sequences or a number of long sequences are to be aligned, the required computations are expensive in memory and central processing unit (CPU) time. In an attempt to bring the tools of large-scale linear programming (LP) methods to bear on this problem, we formulate the alignment process as a controlled Markov chain and construct a suggested alignment based on policies that minimise the expected total cost of the alignment. We discuss the LP associated with the total expected discounted cost and show the results of a solution of the problem based on a primal-dual interior point method. Model parameters, estimated from aligned sequences, along with cost function parameters are used to construct the objective and constraint conditions of the LP problem. This article concludes with a discussion of some alignments obtained from the LP solutions of problems with various cost function parameter values.
Tài liệu tham khảo
Korostensky C, Gonnet G. Gap heuristics and tree construction using gaps. Zurich: Institute of Scientific Computing, 1999
Apostolico A, Giancarlo R. Sequence alignment in molecular biology. J Comput Biol 1998; 5: 173–96
Durbin R, Eddy S, Krogh A, et al. Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge: Cambridge University Press, 1998
Needleman S, Wunsch C. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 1970; 48: 443–53
Waterman M. Introduction to computational biology. London: Chapman and Hall, 1995
Ross S. Applied probability models with optimization applications. San Francisco (CA): Holden-Day, 1970
Ross S. Probability models. 7th ed. San Francisco (CA): Harcourt Academic Press, 2000
Derman C. Finite state Markovian decision processes. New York: Academic Press, 1970
Altman E. Constrained Markov decision processes. London: Chapman and Hall/CRC Press, 1999
Chvatal V. Linear programming. New York: WH Freeman and Company, 1983
Borwein JM, Lewis AS. Convex analysis and nonlinear optimization: theory and examples. New York: Springer-Verlag, 2000
Gonnet G, Korostensky C. Optimal scoring matrices for estimating distances between aligned sequences [online]. Available from URL: http://www.inf.ethz.ch/personal/gonnet/index.html [Accessed 2004 Sep 15]
Huelsenbeck J, Rannala B. Phylogenetic methods come of age: testing hypotheses in an evolutionary context. Science 1997; 276: 227–32
Czyzyk J, Mehrotra S, Wright S. PCx user guide technical report [report no.: OTC 96/01]. Evanston (IL): Optimization Technology Center, 1996 May
Stoye J, Evers D, Meyer F. ROSE: generating sequence families. Bioinformatics 1998; 14: 157–63