Constructing Sequence Alignments from a Markov Decision Model with Estimated Parameter Values

Springer Science and Business Media LLC - Tập 3 - Trang 159-165 - 2012
Fern Y. Hunt1, Anthony J. Kearsley1, Agnes O’Gallagher1
1Mathematical and Computational Sciences Division, National Institute of Standards and Technology, Gaithersburg, USA

Tóm tắt

Current methods for aligning biological sequences are based on dynamic programming algorithms. If large numbers of sequences or a number of long sequences are to be aligned, the required computations are expensive in memory and central processing unit (CPU) time. In an attempt to bring the tools of large-scale linear programming (LP) methods to bear on this problem, we formulate the alignment process as a controlled Markov chain and construct a suggested alignment based on policies that minimise the expected total cost of the alignment. We discuss the LP associated with the total expected discounted cost and show the results of a solution of the problem based on a primal-dual interior point method. Model parameters, estimated from aligned sequences, along with cost function parameters are used to construct the objective and constraint conditions of the LP problem. This article concludes with a discussion of some alignments obtained from the LP solutions of problems with various cost function parameter values.

Tài liệu tham khảo

Korostensky C, Gonnet G. Gap heuristics and tree construction using gaps. Zurich: Institute of Scientific Computing, 1999 Apostolico A, Giancarlo R. Sequence alignment in molecular biology. J Comput Biol 1998; 5: 173–96 Durbin R, Eddy S, Krogh A, et al. Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge: Cambridge University Press, 1998 Needleman S, Wunsch C. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 1970; 48: 443–53 Waterman M. Introduction to computational biology. London: Chapman and Hall, 1995 Ross S. Applied probability models with optimization applications. San Francisco (CA): Holden-Day, 1970 Ross S. Probability models. 7th ed. San Francisco (CA): Harcourt Academic Press, 2000 Derman C. Finite state Markovian decision processes. New York: Academic Press, 1970 Altman E. Constrained Markov decision processes. London: Chapman and Hall/CRC Press, 1999 Chvatal V. Linear programming. New York: WH Freeman and Company, 1983 Borwein JM, Lewis AS. Convex analysis and nonlinear optimization: theory and examples. New York: Springer-Verlag, 2000 Gonnet G, Korostensky C. Optimal scoring matrices for estimating distances between aligned sequences [online]. Available from URL: http://www.inf.ethz.ch/personal/gonnet/index.html [Accessed 2004 Sep 15] Huelsenbeck J, Rannala B. Phylogenetic methods come of age: testing hypotheses in an evolutionary context. Science 1997; 276: 227–32 Czyzyk J, Mehrotra S, Wright S. PCx user guide technical report [report no.: OTC 96/01]. Evanston (IL): Optimization Technology Center, 1996 May Stoye J, Evers D, Meyer F. ROSE: generating sequence families. Bioinformatics 1998; 14: 157–63