Bounds and Transformations for Discounted Finite Markov Decision Chains

Operations Research - Tập 23 Số 4 - Trang 761-784 - 1975
Evan L. Porteus1
1Stanford University, Stanford, California

Tóm tắt

This paper develops new improved bounds on the optimal return function in finite state and action, infinite horizon, discounted stationary Markov decision chains. The bounds are obtained by solving a single-constraint, bounded-variable linear program. They can be used for algorithmic termination criteria and improved tests for suboptimal decisions. We show how to implement these tests so that little additional computational effort is required. We consider several transformations that can be used to convert a process into an equivalent one that may be easier to solve. We examine whether the transformations reduce the spectral radius and/or the norm (maximum row sum) of the process. Gauss-Seidel iteration and Jacobi iteration are shown to be special cases of the general transformations. Gauss-Seidel iteration is given additional consideration. Another special case not only preserves equality of the row sums and sparsity but, when applicable, can dramatically reduce the norm. It reduces each element in a column by that column's smallest element. Several possible computational approaches are applied to a small numerical example.

Từ khóa


Tài liệu tham khảo