On tight bounds for function approximation error in risk-sensitive reinforcement learning

Systems and Control Letters - Tập 150 - Trang 104899 - 2021
Prasenjit Karmakar1, Shalabh Bhatnagar1
1Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India

Tài liệu tham khảo

Kirkwood, 1991 Basu, 2008, A learning algorithm for risk-sensitive cost, Math. Oper. Res., 55, 139 Ruszczyński, 2010, Risk-averse dynamic programming for Markov decision processes, Math. Program., 125, 235, 10.1007/s10107-010-0393-3 Baurele, 2011, Markov decision processes with average-value-at-risk criteria, Math. Methods Oper. Res., 74, 361, 10.1007/s00186-011-0367-0 Ruszczyński, 2006 Pichler, 2017, A quantitative comparison of risk measures, Ann. Oper. Res., 254, 251, 10.1007/s10479-017-2397-3 Balaji, 2000, Multiplicative ergodicity and large deviations for an irreducible Markov chain, Stoch. Process. Appl., 90, 123, 10.1016/S0304-4149(00)00032-6 Borkar, 2002, Risk-sensitive optimal control for Markov decision processes with monotone cost, Math. Oper. Res., 27, 192, 10.1287/moor.27.1.192.334 Borkar, 2006, Stochastic approximation with ‘controlled Markov noise’, Systems Control Lett., 55, 139, 10.1016/j.sysconle.2005.06.005 Marbach, 2001, Simulation-based optimization of Markov reward processes, IEEE Trans. Automat. Control, 46, 191, 10.1109/9.905687 Borkar, 2002, Q-learning for risk-sensitive control, Math. Oper. Res., 27, 294, 10.1287/moor.27.2.294.324 Borkar, 2001, A sensitivity formula for the risk-sensitive cost and the actor-critic algorithm, Systems Control Lett., 44, 339, 10.1016/S0167-6911(01)00152-9 Borkar, 2006, 55, 139 Borkar, 2010 Bapat, 1989, Comparing the spectral radii of two nonnegative matrices, Amer. Math. Monthly, 96, 137, 10.1080/00029890.1989.11972159 Lindqvist, 2002, On comparison of the perron-frobenious eigenvalues of two ML-matrices, Linear Algebr. Appl., 353, 257, 10.1016/S0024-3795(02)00314-2 Bhatia, 1997 Cvetkovski, 2012 Minc, 1970, On the maximal eigenvector of a positive matrix, SIAM J. Numer. Anal., 7, 424, 10.1137/0707035