A Counterexample on Sample-Path Optimality in Stable Markov Decision Chains with the Average Reward Criterion

Journal of Optimization Theory and Applications - Tập 163 - Trang 674-684 - 2013
Rolando Cavazos-Cadena1, Raúl Montes-de-Oca2, Karel Sladký3
1Departamento de Estadística y Cálculo, Universidad Autónoma Agraria Antonio Narro, Buenavista, Mexico
2Departamento de Matemáticas, Universidad Autónoma Metropolitana, México, Mexico
3Institute of Information Theory and Automation, Praha 8, Czech Republic

Tóm tắt

This note deals with Markov decision chains evolving on a denumerable state space. Under standard continuity-compactness requirements, an explicit example is provided to show that, with respect to a strong sample-path average reward criterion, the Lyapunov function condition does not ensure the existence of an optimal stationary policy.

Tài liệu tham khảo