On maximizing probabilities for over-performing a target for Markov decision processes

Tanhao Huang1, Yanan Dai1, Jinwen Chen1
1Department of Mathematics, Tsinghua University, Beijing, China

Tóm tắt

This paper studies the dual relation between risk-sensitive control and large deviation control of maximizing the probability for out-performing a target for Markov Decision Processes. To derive the desired duality, we apply a non-linear extension of the Krein-Rutman Theorem to characterize the optimal risk-sensitive value and prove that an optimal policy exists which is stationary and deterministic. The right-hand side derivative of this value function is used to characterize the specific targets which make the duality to hold. It is proved that the optimal policy for the “out-performing” probability can be approximated by the optimal one for the risk-sensitive control. The range of the (right-hand, left-hand side) derivative of the optimal risk-sensitive value function plays an important role. Some essential differences between these two types of optimal control problems are presented.

Tài liệu tham khảo

Alsheikh MA, Hoang DT, Niyato D et al (2015) Markov decision processes with applications in wireless sensor networks: a survey. IEEE Commun Surv Tutor 17(3):1239–1267 Anantharam V, Borkar VS (2017) A variational formula for risk-sensitive reward. SIAM J Control Optim 55(2):961–988 Boucherie RJ, Van Dijk NM (2017) Markov decision processes in practice. Springer, Berlin Dembo A, Zeitouni O (2010) Large deviations techniques and applications. Stochastic modelling and applied probability. Springer, Berlin, p 38 Di Masi GB, Stettner L (1999) Risk-sensitive control of discrete-time Markov processes with infinite horizon. SIAM J Control Optim 38(1):61–78 Dupuis P, Ellis RS (1997) A weak convergence approach to the theory of large deviations. Wiley, New York Feinberg EA, Shwartz A (2002) Handbook of Markov decision processes. Springer, Heidelberg Fleming WH, Hernandez-Hernandez D (1997) Risk-sensitive control of finite state machines on an infinite horizon I. SIAM J Control Optim 35(5):1790–1810 Gosavi A (2006) A risk-sensitive approach to total productive maintenance. Automatica 42:1321–1330 Hata H, Nagai H, Sheu SJ (2010) Asymptotics of the probability minimizing a “down-side’’ risk. Ann Appl Probab 20(1):52–89 Jaskiewicz A (2007) Average optimality for risk-sensitive control with general state space. Ann Appl Probab 17(2):654–675 Nagai H (2012) Downside risk minimization via a large deviations approach. Ann Appl Probab 22(2):608–669 Ogiwara T (1995) Nonlinear Perron-Frobenius problem on an ordered Banach space. Jpn J Math 21(1):43–103 Pham H (2003) A large deviations approach to optimal long term investment. Financ Stoch 7(2):169–195 Pham H (2003) A risk-sensitive control dual approach to a large deviations control problem. Syst Control Lett 49(4):295–309 Piunovskiy A, Zhang Y (2010) Modern trends in controlled stochastic processes. Springer, Berlin Puhalskii AA (2011) On portfolio choice by maximizing the outperformance probability. Math Financ Int J Math Stat Financ Econ 21(1):145–167 Puhalskii AA (2019) On long term investment optimality. Appl Math Optim 80(1):1–62 Puterman ML (1994) Markov decision processes. Wiley, Amsterdam Rockafellar RT (1970) Convex analysis. Princeton University Press Stettner L (2004) Duality and risk sensitive portfolio optimization. Contemp Math 351:333–348 White DJ (1993) A survey of applications of Markov decision processes. J Opl Res Soc 44(2):1073–1096