Risk-averse autonomous systems: A brief history and recent developments from the perspective of optimal control
Tài liệu tham khảo
Risk
Sutton, 2014
Bertsekas, 1996
Bertsekas, 1971, On the minimax reachability of target sets and target tubes, Automatica, 7, 233, 10.1016/0005-1098(71)90066-5
Heger, 1994, Consideration of risk in reinforcement learning, 105
Coraluppi, 1999, Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes, Automatica, 35, 301, 10.1016/S0005-1098(98)00153-8
Morimoto, 2005, Robust reinforcement learning, Neural Comput., 17, 335, 10.1162/0899766053011528
Margellos, 2011, Hamilton–Jacobi formulation for reach–avoid differential games, IEEE Trans. Autom. Control, 56, 1849, 10.1109/TAC.2011.2105730
Chen, 2018, Hamilton–Jacobi reachability: some recent theoretical advances and applications in unmanned airspace management, Annu. Rev. Control Robotics Auton. Syst., 1, 333, 10.1146/annurev-control-060117-104941
Pecka, 2014, Safe exploration techniques for reinforcement learning–an overview, 357
García, 2012, Safe exploration of state and action spaces in reinforcement learning, J. Artif. Intell. Res., 45, 515, 10.1613/jair.3761
Ravichandar, 2020, Recent advances in robot learning from demonstration, Annu. Rev. Control Robotics Auton. Syst., 3, 297, 10.1146/annurev-control-100819-063206
García, 2015, A comprehensive survey on safe reinforcement learning, J. Mach. Learn. Res., 16, 1437
Hewing, 2020, Learning-based model predictive control: toward safe learning in control, Annu. Rev. Control Robotics Auton. Syst., 3, 269, 10.1146/annurev-control-090419-075625
Brunke, 2022, Safe learning in robotics: from learning-based control to safe reinforcement learning, Annu. Rev. Control Robotics Auton. Syst., 5, 411, 10.1146/annurev-control-042920-020211
Azar, 2020, From inverse optimal control to inverse reinforcement learning: a historical review, Annu. Rev. Control, 50, 119, 10.1016/j.arcontrol.2020.06.001
Arora, 2021, A survey of inverse reinforcement learning: challenges, methods and progress, Artif. Intell., 297, 10.1016/j.artint.2021.103500
Folland, 1999
Ash, 1972
Hernández-Lerma, 1996
Chapman, 2021, Risk-sensitive safety analysis using conditional value-at-risk, IEEE Trans. Autom. Control
Chapman, 2022, On optimizing the conditional value-at-risk of a maximum cost for risk-averse safety analysis, IEEE Trans. Autom. Control, 10.1109/TAC.2021.3131149
Pnueli, 1977, The temporal logic of programs, 46
Coogan, 2017, Formal methods for control of traffic flow: automated control synthesis from finite-state transition models, IEEE Control Syst. Mag., 37, 109, 10.1109/MCS.2016.2643259
Kwiatkowska, 2007, Stochastic model checking, 220
Forejt, 2011, Automated verification techniques for probabilistic systems, 53
Shapiro, 2009
Eeckhoudt, 2005
Bernoulli, 1954, Exposition of a new theory on the measurement of risk, Econometrica, 22, 23, 10.2307/1909829
von Neumann, 1944
Bäuerle, 2014, More risk-sensitive Markov decision processes, Math. Oper. Res., 39, 105, 10.1287/moor.2013.0601
Whittle, 1981, Risk-sensitive linear/quadratic/Gaussian control, Adv. Appl. Probab., 13, 764, 10.2307/1426972
Markowitz, 1952, Porfolio selection, J. Finance, 7, 77
Markowitz, 1959
Won, 2005, Cost-cumulants and risk-sensitive control, 1061
Miller, 2017, Optimal control of conditional value-at-risk in continuous time, SIAM J. Control Optim., 55, 856, 10.1137/16M1058492
Rockafellar, 2000, Optimization of conditional value-at-risk, J. Risk, 2, 21, 10.21314/JOR.2000.038
Rockafellar, 2002, Conditional value-at-risk for general loss distributions, J. Bank. Finance, 26, 1443, 10.1016/S0378-4266(02)00271-6
Acerbi, 2002, On the coherence of expected shortfall, J. Bank. Finance, 26, 1487, 10.1016/S0378-4266(02)00283-2
Shapiro, 2012, Minimax and risk averse multistage stochastic programming, Eur. J. Oper. Res., 219, 719, 10.1016/j.ejor.2011.11.005
Ruszczyński, 2010, Risk-averse dynamic programming for Markov decision processes, Math. Program., 125, 235, 10.1007/s10107-010-0393-3
Ruszczyński, 2014, Erratum to: risk-averse dynamic programming for Markov decision processes, Math. Program., 145, 601, 10.1007/s10107-014-0783-z
Bäuerle, 2022, Markov decision processes with recursive risk measures, Eur. J. Oper. Res., 296, 953, 10.1016/j.ejor.2021.04.030
Shen, 2013, Risk-sensitive Markov control processes, SIAM J. Control Optim., 51, 3652, 10.1137/120899005
Singh, 2018, A framework for time-consistent, risk-sensitive model predictive control: theory and algorithms, IEEE Trans. Autom. Control, 64, 2905, 10.1109/TAC.2018.2874704
Köse, 2021, Risk-averse learning by temporal difference methods with Markov risk measures, J. Mach. Learn. Res., 22, 1
Artzner, 1999, Coherent measures of risk, Math. Finance, 9, 203, 10.1111/1467-9965.00068
Majumdar, 2020, How should a robot assess risk? Towards an axiomatic theory of risk in robotics, 75
Kisiala, 2015
Pflug, 2016, Time-consistent decisions and temporal decomposition of coherent risk functionals, Math. Oper. Res., 41, 682, 10.1287/moor.2015.0747
Bäuerle, 2011, Markov decision processes with average-value-at-risk criteria, Math. Methods Oper. Res., 74, 361, 10.1007/s00186-011-0367-0
Haskell, 2015, A convex analytic approach to risk-aware Markov decision processes, SIAM J. Control Optim., 53, 1569, 10.1137/140969221
Bäuerle, 2021, Minimizing spectral risk measures applied to Markov decision processes, Math. Methods Oper. Res., 94, 35, 10.1007/s00186-021-00746-w
Smith
Glover, 1988, State-space formulae for all stabilizing controllers that satisfy an H∞-norm bound and relations to risk sensitivity, Syst. Control Lett., 11, 167, 10.1016/0167-6911(88)90055-2
Löfberg, 2003
Blanchini, 1999, Set invariance in control, Automatica, 35, 1747, 10.1016/S0005-1098(99)00113-2
Wan, 2003, An efficient off-line formulation of robust model predictive control using linear matrix inequalities, Automatica, 39, 837, 10.1016/S0005-1098(02)00174-7
Nilsson, 2016, Synthesis of separable controlled invariant sets for modular local control design, 5656
Majumdar, 2014, Control and verification of high-dimensional systems with DSOS and SDSOS programming, 394
Ahmadi, 2019, DSOS and SDSOS optimization: more tractable alternatives to sum of squares and semidefinite optimization, SIAM J. Appl. Algebra Geom., 3, 193, 10.1137/18M118935X
Mitchell, 2005, A time-dependent Hamilton-Jacobi formulation of reachable sets for continuous dynamic games, IEEE Trans. Autom. Control, 50, 947, 10.1109/TAC.2005.851439
Fisac, 2015, Reach-avoid problems with time-varying dynamics, targets and constraints, 11
Chen, 2013, An analyzer for non-linear hybrid systems, 258
Dutta, 2019, Reachability analysis for neural feedback systems using regressive polynomial rule inference, 157
Ivanov, 2020, Verifying the safety of autonomous systems with neural network controllers, ACM Trans. Embed. Comput. Syst., 20, 1, 10.1145/3419742
Eggers, 2008, A direct SAT approach to hybrid systems, 171
Gao, 2013, dReal: an SMT solver for nonlinear theories over the reals, 208
Kong, 2015, dReach: δ-reachability analysis for hybrid systems, 200
Ivanov, 2019, Verisig: verifying safety properties of hybrid systems with neural network controllers, 169
Huang, 2019, ReachNN: reachability analysis of neural-network controlled systems, ACM Trans. Embed. Comput. Syst., 18, 1, 10.1145/3358228
Başar, 1995
Raman, 2014, Model predictive control with signal temporal logic specifications, 81
Geibel, 2001, Reinforcement learning with bounded risk, 162
Geibel, 2005, Risk-sensitive reinforcement learning applied to control under constraints, J. Artif. Intell. Res., 24, 81, 10.1613/jair.1666
Abate, 2008, Probabilistic reachability and safety for controlled discrete time stochastic hybrid systems, Automatica, 44, 2724, 10.1016/j.automatica.2008.03.027
Ding, 2013, A stochastic games framework for verification and control of discrete time stochastic hybrid systems, Automatica, 49, 2665, 10.1016/j.automatica.2013.05.025
Yang, 2018, A dynamic game approach to distributionally robust safety specifications for stochastic systems, Automatica, 94, 94, 10.1016/j.automatica.2018.04.022
Summers, 2010, Verification of discrete time stochastic hybrid systems: a stochastic reach-avoid decision problem, Automatica, 46, 1951, 10.1016/j.automatica.2010.08.006
Moldovan, 2012, Safe exploration in Markov decision processes
Schildbach, 2014, The scenario approach for stochastic model predictive control with bounds on closed-loop constraint violations, Automatica, 50, 3009, 10.1016/j.automatica.2014.10.035
Sadigh, 2016, Safe control under uncertainty with probabilistic signal temporal logic
Jha, 2018, Safe autonomy under perception uncertainty using chance-constrained temporal logic, J. Autom. Reason., 60, 43, 10.1007/s10817-017-9413-9
Farahani, 2018, Shrinking horizon model predictive control with signal temporal logic constraints under stochastic disturbances, IEEE Trans. Autom. Control, 64, 3324, 10.1109/TAC.2018.2880651
Bertsimas, 2018, Data-driven robust optimization, Math. Program., 167, 235, 10.1007/s10107-017-1125-8
Esfahani, 2018, Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations, Math. Program., 171, 115, 10.1007/s10107-017-1172-1
Yang, 2020, Wasserstein distributionally robust stochastic control: a data-driven approach, IEEE Trans. Autom. Control, 66, 3863, 10.1109/TAC.2020.3030884
Zakaria, 2020, Uncertainty models for stochastic optimization in renewable energy applications, Renew. Energy, 145, 1543, 10.1016/j.renene.2019.07.081
Harremoës, 1988, Stochastic models for estimation of extreme pollution from urban runoff, Water Res., 22, 1017, 10.1016/0043-1354(88)90149-2
del Giudice, 2015, Comparison of two stochastic techniques for reliable urban runoff prediction by modeling systematic errors, Water Resour. Res., 51, 5004, 10.1002/2014WR016678
Rao, 2002, Control, exploitation and tolerance of intracellular noise, Nature, 420, 231, 10.1038/nature01258
Eling, 2019, Challenges in measuring and understanding biological noise, Nat. Rev. Genet., 20, 536, 10.1038/s41576-019-0130-6
Howard, 1972, Risk-sensitive Markov decision processes, Manag. Sci., 18, 356, 10.1287/mnsc.18.7.356
Jacobson, 1973, Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games, IEEE Trans. Autom. Control, 18, 124, 10.1109/TAC.1973.1100265
Whittle, 1991, A risk-sensitive maximum principle: the case of imperfect state observation, IEEE Trans. Autom. Control, 36, 793, 10.1109/9.85059
di Masi, 1999, Risk-sensitive control of discrete-time Markov processes with infinite horizon, SIAM J. Control Optim., 38, 61, 10.1137/S0363012997320614
Borkar, 2002, Q-learning for risk-sensitive control, Math. Oper. Res., 27, 294, 10.1287/moor.27.2.294.324
Bielecki, 1999, Risk sensitive control of finite state Markov chains in discrete time, with applications to portfolio management, Math. Methods Oper. Res., 50, 167, 10.1007/s001860050094
Cavazos-Cadena, 2011, Discounted approximations for risk-sensitive average criteria in Markov decision chains with finite state space, Math. Oper. Res., 36, 133, 10.1287/moor.1100.0476
Blancas-Rivera, 2020, Discounted approximations in risk-sensitive average Markov cost chains with finite state space, Math. Methods Oper. Res., 91, 241, 10.1007/s00186-019-00689-3
di Masi, 2007, Infinite horizon risk sensitive control of discrete time Markov processes under minorization property, SIAM J. Control Optim., 46, 231, 10.1137/040618631
Jaśkiewicz, 2007, Average optimality for risk-sensitive control with general state space, Ann. Appl. Probab., 17, 654, 10.1214/105051606000000790
Anantharam, 2017, A variational formula for risk-sensitive reward, SIAM J. Control Optim., 55, 961, 10.1137/151002630
Chapman, 2021, Classical risk-averse control for a finite-horizon Borel model, IEEE Contr. Syst. Lett., 6, 1525, 10.1109/LCSYS.2021.3114126
Kreps, 1977, Decision problems with expected utility criteria, II: stationarity, Math. Oper. Res., 2, 266, 10.1287/moor.2.3.266
Chow, 2015, Risk-sensitive and robust decision-making: a CVaR optimization approach, 1522
Pflug, 2016, Time-inconsistent multistage stochastic programs: martingale bounds, Eur. J. Oper. Res., 249, 155, 10.1016/j.ejor.2015.02.033
Chapman, 2021, Toward a scalable upper bound for a CVaR-LQ problem, IEEE Control Syst. Lett., 6, 920, 10.1109/LCSYS.2021.3086842
Chapman, 2019, A risk-sensitive finite-time reachability approach for safety of stochastic dynamic systems, 2958
Asienkiewicz, 2017, A note on a new class of recursive utilities in Markov decision processes, Appl. Math., 44, 149
van Parys, 2015, Distributionally robust control of constrained stochastic systems, IEEE Trans. Autom. Control, 61, 430
Borkar, 2014, Risk-constrained Markov decision processes, IEEE Trans. Autom. Control, 59, 2574, 10.1109/TAC.2014.2309262
Samuelson, 2018, Safety-aware optimal control of stochastic systems using conditional value-at-risk, 6285
Lindemann, 2021, STL robustness risk over discrete-time stochastic processes, 1329
Lindemann, 2021, Reactive and risk-aware control for signal temporal logic, IEEE Trans. Autom. Control
Barbosa, 2021, Risk-aware motion planning in partially known environments, 5220
Safaoui, 2020, Control design for risk-based signal temporal logic specifications, IEEE Control Syst. Lett., 4, 1000, 10.1109/LCSYS.2020.2998543
Luce, 1957
Speyer, 1974, Optimization of stochastic linear systems with additive measurement and process noise using exponential performance criteria, IEEE Trans. Autom. Control, 19, 358, 10.1109/TAC.1974.1100606
Başar, 1999, Nash equilibria of risk-sensitive nonlinear stochastic differential games, J. Optim. Theory Appl., 100, 479, 10.1023/A:1022678204735
Moon, 2016, Linear quadratic risk-sensitive and robust mean field games, IEEE Trans. Autom. Control, 62, 1062, 10.1109/TAC.2016.2579264
Moon, 2019, Risk-sensitive mean field games via the stochastic maximum principle, Dyn. Games Appl., 9, 1100, 10.1007/s13235-018-00290-z
Saldi, 2020, Approximate Markov-Nash equilibria for discrete-time risk-sensitive mean-field games, Math. Oper. Res., 45, 1596, 10.1287/moor.2019.1044
Björk, 2014, A theory of Markovian time-inconsistent stochastic control in discrete time, Finance Stoch., 18, 545, 10.1007/s00780-014-0234-y
Witten, 1977, An adaptive optimal controller for discrete-time Markov environments, Inf. Control, 34, 286, 10.1016/S0019-9958(77)90354-0
Watkins, 1989
Watkins, 1992, Q-learning, Mach. Learn., 8, 279, 10.1007/BF00992698
Mihatsch, 2002, Risk-sensitive reinforcement learning, Mach. Learn., 49, 267, 10.1023/A:1017940631555
Shen, 2014, Risk-sensitive reinforcement learning, Neural Comput., 26, 1298, 10.1162/NECO_a_00600
Huang, 2017, Risk-aware Q-learning for Markov decision processes, 4928
Huang, 2021, Stochastic approximation for risk-aware Markov decision processes, IEEE Trans. Autom. Control, 66, 1314, 10.1109/TAC.2020.2989702
Hanna, 2021, Importance sampling in reinforcement learning with an estimated behavior policy, Mach. Learn., 1
Sastry, 1989
Schneider, 1996, Exploiting model uncertainty estimates for safe dynamic control learning, 1047
Perkins, 2002, Lyapunov design for safe reinforcement learning, J. Mach. Learn. Res., 3, 803