On tight bounds for function approximation error in risk-sensitive reinforcement learning
Tài liệu tham khảo
Kirkwood, 1991
Basu, 2008, A learning algorithm for risk-sensitive cost, Math. Oper. Res., 55, 139
Ruszczyński, 2010, Risk-averse dynamic programming for Markov decision processes, Math. Program., 125, 235, 10.1007/s10107-010-0393-3
Baurele, 2011, Markov decision processes with average-value-at-risk criteria, Math. Methods Oper. Res., 74, 361, 10.1007/s00186-011-0367-0
Ruszczyński, 2006
Pichler, 2017, A quantitative comparison of risk measures, Ann. Oper. Res., 254, 251, 10.1007/s10479-017-2397-3
Balaji, 2000, Multiplicative ergodicity and large deviations for an irreducible Markov chain, Stoch. Process. Appl., 90, 123, 10.1016/S0304-4149(00)00032-6
Borkar, 2002, Risk-sensitive optimal control for Markov decision processes with monotone cost, Math. Oper. Res., 27, 192, 10.1287/moor.27.1.192.334
Borkar, 2006, Stochastic approximation with ‘controlled Markov noise’, Systems Control Lett., 55, 139, 10.1016/j.sysconle.2005.06.005
Marbach, 2001, Simulation-based optimization of Markov reward processes, IEEE Trans. Automat. Control, 46, 191, 10.1109/9.905687
Borkar, 2002, Q-learning for risk-sensitive control, Math. Oper. Res., 27, 294, 10.1287/moor.27.2.294.324
Borkar, 2001, A sensitivity formula for the risk-sensitive cost and the actor-critic algorithm, Systems Control Lett., 44, 339, 10.1016/S0167-6911(01)00152-9
Borkar, 2006, 55, 139
Borkar, 2010
Bapat, 1989, Comparing the spectral radii of two nonnegative matrices, Amer. Math. Monthly, 96, 137, 10.1080/00029890.1989.11972159
Lindqvist, 2002, On comparison of the perron-frobenious eigenvalues of two ML-matrices, Linear Algebr. Appl., 353, 257, 10.1016/S0024-3795(02)00314-2
Bhatia, 1997
Cvetkovski, 2012
Minc, 1970, On the maximal eigenvector of a positive matrix, SIAM J. Numer. Anal., 7, 424, 10.1137/0707035