On maximizing probabilities for over-performing a target for Markov decision processes
Springer Science and Business Media LLC - Trang 1-29 - 2023
Tóm tắt
This paper studies the dual relation between risk-sensitive control and large deviation control of maximizing the probability for out-performing a target for Markov Decision Processes. To derive the desired duality, we apply a non-linear extension of the Krein-Rutman Theorem to characterize the optimal risk-sensitive value and prove that an optimal policy exists which is stationary and deterministic. The right-hand side derivative of this value function is used to characterize the specific targets which make the duality to hold. It is proved that the optimal policy for the “out-performing” probability can be approximated by the optimal one for the risk-sensitive control. The range of the (right-hand, left-hand side) derivative of the optimal risk-sensitive value function plays an important role. Some essential differences between these two types of optimal control problems are presented.
Tài liệu tham khảo
Alsheikh MA, Hoang DT, Niyato D et al (2015) Markov decision processes with applications in wireless sensor networks: a survey. IEEE Commun Surv Tutor 17(3):1239–1267
Anantharam V, Borkar VS (2017) A variational formula for risk-sensitive reward. SIAM J Control Optim 55(2):961–988
Boucherie RJ, Van Dijk NM (2017) Markov decision processes in practice. Springer, Berlin
Dembo A, Zeitouni O (2010) Large deviations techniques and applications. Stochastic modelling and applied probability. Springer, Berlin, p 38
Di Masi GB, Stettner L (1999) Risk-sensitive control of discrete-time Markov processes with infinite horizon. SIAM J Control Optim 38(1):61–78
Dupuis P, Ellis RS (1997) A weak convergence approach to the theory of large deviations. Wiley, New York
Feinberg EA, Shwartz A (2002) Handbook of Markov decision processes. Springer, Heidelberg
Fleming WH, Hernandez-Hernandez D (1997) Risk-sensitive control of finite state machines on an infinite horizon I. SIAM J Control Optim 35(5):1790–1810
Gosavi A (2006) A risk-sensitive approach to total productive maintenance. Automatica 42:1321–1330
Hata H, Nagai H, Sheu SJ (2010) Asymptotics of the probability minimizing a “down-side’’ risk. Ann Appl Probab 20(1):52–89
Jaskiewicz A (2007) Average optimality for risk-sensitive control with general state space. Ann Appl Probab 17(2):654–675
Nagai H (2012) Downside risk minimization via a large deviations approach. Ann Appl Probab 22(2):608–669
Ogiwara T (1995) Nonlinear Perron-Frobenius problem on an ordered Banach space. Jpn J Math 21(1):43–103
Pham H (2003) A large deviations approach to optimal long term investment. Financ Stoch 7(2):169–195
Pham H (2003) A risk-sensitive control dual approach to a large deviations control problem. Syst Control Lett 49(4):295–309
Piunovskiy A, Zhang Y (2010) Modern trends in controlled stochastic processes. Springer, Berlin
Puhalskii AA (2011) On portfolio choice by maximizing the outperformance probability. Math Financ Int J Math Stat Financ Econ 21(1):145–167
Puhalskii AA (2019) On long term investment optimality. Appl Math Optim 80(1):1–62
Puterman ML (1994) Markov decision processes. Wiley, Amsterdam
Rockafellar RT (1970) Convex analysis. Princeton University Press
Stettner L (2004) Duality and risk sensitive portfolio optimization. Contemp Math 351:333–348
White DJ (1993) A survey of applications of Markov decision processes. J Opl Res Soc 44(2):1073–1096