A Counterexample on Sample-Path Optimality in Stable Markov Decision Chains with the Average Reward Criterion
Tóm tắt
This note deals with Markov decision chains evolving on a denumerable state space. Under standard continuity-compactness requirements, an explicit example is provided to show that, with respect to a strong sample-path average reward criterion, the Lyapunov function condition does not ensure the existence of an optimal stationary policy.
Tài liệu tham khảo
Hordijk, A.: Dynamic Programming and Potential Theory. Mathematical Centre Tract, vol. 51. Mathematisch Centrum, Amsterdam (1974)
Cavazos-Cadena, R., Montes-de-Oca, R.: Sample-path optimality in average Markov decision chains under a double Lyapunov function condition. In: Hernández-Hernández, D., Minjárez-Sosa, A. (eds.) Optimization, Control, and Applications of Stochastic Systems, In Honor of Onésimo Hernández-Lerma, pp. 31–57. Springer, New York (2012)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)
Thomas, L.C.: Connectedness conditions for denumerable state Markov decision processes. In: Hartley, R., Thomas, L.C., White, D.J. (eds.) Recent Developments in Markov Decision Processes, pp. 181–204. Academic Press, London (1980)
Cavazos-Cadena, R., Fernández-Gaucherand, E.: Denumerable controlled Markov chains with average reward criterion: sample path optimality. Math. Methods Oper. Res. 41, 89–108 (1995)
Lasserre, J.B.: Sample-path average optimality for Markov control processes. IEEE Trans. Autom. Control 44, 1966–1971 (1999)
Hunt, F.Y.: Sample path optimality for a Markov optimization problems. Stoch. Process. Appl. 115, 769–779 (2005)
Ross, S.M.: Applied Probability Models with Optimization Applications. Holden-Day, Oakland (1970)