Risk-averse autonomous systems: A brief history and recent developments from the perspective of optimal control

Artificial Intelligence - Tập 311 - Trang 103743 - 2022
Yuheng Wang1, Margaret P. Chapman1
1Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, 10 King's College Road, Toronto, Ontario, M5S 3G8, Canada

Tài liệu tham khảo

Risk Sutton, 2014 Bertsekas, 1996 Bertsekas, 1971, On the minimax reachability of target sets and target tubes, Automatica, 7, 233, 10.1016/0005-1098(71)90066-5 Heger, 1994, Consideration of risk in reinforcement learning, 105 Coraluppi, 1999, Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes, Automatica, 35, 301, 10.1016/S0005-1098(98)00153-8 Morimoto, 2005, Robust reinforcement learning, Neural Comput., 17, 335, 10.1162/0899766053011528 Margellos, 2011, Hamilton–Jacobi formulation for reach–avoid differential games, IEEE Trans. Autom. Control, 56, 1849, 10.1109/TAC.2011.2105730 Chen, 2018, Hamilton–Jacobi reachability: some recent theoretical advances and applications in unmanned airspace management, Annu. Rev. Control Robotics Auton. Syst., 1, 333, 10.1146/annurev-control-060117-104941 Pecka, 2014, Safe exploration techniques for reinforcement learning–an overview, 357 García, 2012, Safe exploration of state and action spaces in reinforcement learning, J. Artif. Intell. Res., 45, 515, 10.1613/jair.3761 Ravichandar, 2020, Recent advances in robot learning from demonstration, Annu. Rev. Control Robotics Auton. Syst., 3, 297, 10.1146/annurev-control-100819-063206 García, 2015, A comprehensive survey on safe reinforcement learning, J. Mach. Learn. Res., 16, 1437 Hewing, 2020, Learning-based model predictive control: toward safe learning in control, Annu. Rev. Control Robotics Auton. Syst., 3, 269, 10.1146/annurev-control-090419-075625 Brunke, 2022, Safe learning in robotics: from learning-based control to safe reinforcement learning, Annu. Rev. Control Robotics Auton. Syst., 5, 411, 10.1146/annurev-control-042920-020211 Azar, 2020, From inverse optimal control to inverse reinforcement learning: a historical review, Annu. Rev. Control, 50, 119, 10.1016/j.arcontrol.2020.06.001 Arora, 2021, A survey of inverse reinforcement learning: challenges, methods and progress, Artif. Intell., 297, 10.1016/j.artint.2021.103500 Folland, 1999 Ash, 1972 Hernández-Lerma, 1996 Chapman, 2021, Risk-sensitive safety analysis using conditional value-at-risk, IEEE Trans. Autom. Control Chapman, 2022, On optimizing the conditional value-at-risk of a maximum cost for risk-averse safety analysis, IEEE Trans. Autom. Control, 10.1109/TAC.2021.3131149 Pnueli, 1977, The temporal logic of programs, 46 Coogan, 2017, Formal methods for control of traffic flow: automated control synthesis from finite-state transition models, IEEE Control Syst. Mag., 37, 109, 10.1109/MCS.2016.2643259 Kwiatkowska, 2007, Stochastic model checking, 220 Forejt, 2011, Automated verification techniques for probabilistic systems, 53 Shapiro, 2009 Eeckhoudt, 2005 Bernoulli, 1954, Exposition of a new theory on the measurement of risk, Econometrica, 22, 23, 10.2307/1909829 von Neumann, 1944 Bäuerle, 2014, More risk-sensitive Markov decision processes, Math. Oper. Res., 39, 105, 10.1287/moor.2013.0601 Whittle, 1981, Risk-sensitive linear/quadratic/Gaussian control, Adv. Appl. Probab., 13, 764, 10.2307/1426972 Markowitz, 1952, Porfolio selection, J. Finance, 7, 77 Markowitz, 1959 Won, 2005, Cost-cumulants and risk-sensitive control, 1061 Miller, 2017, Optimal control of conditional value-at-risk in continuous time, SIAM J. Control Optim., 55, 856, 10.1137/16M1058492 Rockafellar, 2000, Optimization of conditional value-at-risk, J. Risk, 2, 21, 10.21314/JOR.2000.038 Rockafellar, 2002, Conditional value-at-risk for general loss distributions, J. Bank. Finance, 26, 1443, 10.1016/S0378-4266(02)00271-6 Acerbi, 2002, On the coherence of expected shortfall, J. Bank. Finance, 26, 1487, 10.1016/S0378-4266(02)00283-2 Shapiro, 2012, Minimax and risk averse multistage stochastic programming, Eur. J. Oper. Res., 219, 719, 10.1016/j.ejor.2011.11.005 Ruszczyński, 2010, Risk-averse dynamic programming for Markov decision processes, Math. Program., 125, 235, 10.1007/s10107-010-0393-3 Ruszczyński, 2014, Erratum to: risk-averse dynamic programming for Markov decision processes, Math. Program., 145, 601, 10.1007/s10107-014-0783-z Bäuerle, 2022, Markov decision processes with recursive risk measures, Eur. J. Oper. Res., 296, 953, 10.1016/j.ejor.2021.04.030 Shen, 2013, Risk-sensitive Markov control processes, SIAM J. Control Optim., 51, 3652, 10.1137/120899005 Singh, 2018, A framework for time-consistent, risk-sensitive model predictive control: theory and algorithms, IEEE Trans. Autom. Control, 64, 2905, 10.1109/TAC.2018.2874704 Köse, 2021, Risk-averse learning by temporal difference methods with Markov risk measures, J. Mach. Learn. Res., 22, 1 Artzner, 1999, Coherent measures of risk, Math. Finance, 9, 203, 10.1111/1467-9965.00068 Majumdar, 2020, How should a robot assess risk? Towards an axiomatic theory of risk in robotics, 75 Kisiala, 2015 Pflug, 2016, Time-consistent decisions and temporal decomposition of coherent risk functionals, Math. Oper. Res., 41, 682, 10.1287/moor.2015.0747 Bäuerle, 2011, Markov decision processes with average-value-at-risk criteria, Math. Methods Oper. Res., 74, 361, 10.1007/s00186-011-0367-0 Haskell, 2015, A convex analytic approach to risk-aware Markov decision processes, SIAM J. Control Optim., 53, 1569, 10.1137/140969221 Bäuerle, 2021, Minimizing spectral risk measures applied to Markov decision processes, Math. Methods Oper. Res., 94, 35, 10.1007/s00186-021-00746-w Smith Glover, 1988, State-space formulae for all stabilizing controllers that satisfy an H∞-norm bound and relations to risk sensitivity, Syst. Control Lett., 11, 167, 10.1016/0167-6911(88)90055-2 Löfberg, 2003 Blanchini, 1999, Set invariance in control, Automatica, 35, 1747, 10.1016/S0005-1098(99)00113-2 Wan, 2003, An efficient off-line formulation of robust model predictive control using linear matrix inequalities, Automatica, 39, 837, 10.1016/S0005-1098(02)00174-7 Nilsson, 2016, Synthesis of separable controlled invariant sets for modular local control design, 5656 Majumdar, 2014, Control and verification of high-dimensional systems with DSOS and SDSOS programming, 394 Ahmadi, 2019, DSOS and SDSOS optimization: more tractable alternatives to sum of squares and semidefinite optimization, SIAM J. Appl. Algebra Geom., 3, 193, 10.1137/18M118935X Mitchell, 2005, A time-dependent Hamilton-Jacobi formulation of reachable sets for continuous dynamic games, IEEE Trans. Autom. Control, 50, 947, 10.1109/TAC.2005.851439 Fisac, 2015, Reach-avoid problems with time-varying dynamics, targets and constraints, 11 Chen, 2013, An analyzer for non-linear hybrid systems, 258 Dutta, 2019, Reachability analysis for neural feedback systems using regressive polynomial rule inference, 157 Ivanov, 2020, Verifying the safety of autonomous systems with neural network controllers, ACM Trans. Embed. Comput. Syst., 20, 1, 10.1145/3419742 Eggers, 2008, A direct SAT approach to hybrid systems, 171 Gao, 2013, dReal: an SMT solver for nonlinear theories over the reals, 208 Kong, 2015, dReach: δ-reachability analysis for hybrid systems, 200 Ivanov, 2019, Verisig: verifying safety properties of hybrid systems with neural network controllers, 169 Huang, 2019, ReachNN: reachability analysis of neural-network controlled systems, ACM Trans. Embed. Comput. Syst., 18, 1, 10.1145/3358228 Başar, 1995 Raman, 2014, Model predictive control with signal temporal logic specifications, 81 Geibel, 2001, Reinforcement learning with bounded risk, 162 Geibel, 2005, Risk-sensitive reinforcement learning applied to control under constraints, J. Artif. Intell. Res., 24, 81, 10.1613/jair.1666 Abate, 2008, Probabilistic reachability and safety for controlled discrete time stochastic hybrid systems, Automatica, 44, 2724, 10.1016/j.automatica.2008.03.027 Ding, 2013, A stochastic games framework for verification and control of discrete time stochastic hybrid systems, Automatica, 49, 2665, 10.1016/j.automatica.2013.05.025 Yang, 2018, A dynamic game approach to distributionally robust safety specifications for stochastic systems, Automatica, 94, 94, 10.1016/j.automatica.2018.04.022 Summers, 2010, Verification of discrete time stochastic hybrid systems: a stochastic reach-avoid decision problem, Automatica, 46, 1951, 10.1016/j.automatica.2010.08.006 Moldovan, 2012, Safe exploration in Markov decision processes Schildbach, 2014, The scenario approach for stochastic model predictive control with bounds on closed-loop constraint violations, Automatica, 50, 3009, 10.1016/j.automatica.2014.10.035 Sadigh, 2016, Safe control under uncertainty with probabilistic signal temporal logic Jha, 2018, Safe autonomy under perception uncertainty using chance-constrained temporal logic, J. Autom. Reason., 60, 43, 10.1007/s10817-017-9413-9 Farahani, 2018, Shrinking horizon model predictive control with signal temporal logic constraints under stochastic disturbances, IEEE Trans. Autom. Control, 64, 3324, 10.1109/TAC.2018.2880651 Bertsimas, 2018, Data-driven robust optimization, Math. Program., 167, 235, 10.1007/s10107-017-1125-8 Esfahani, 2018, Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations, Math. Program., 171, 115, 10.1007/s10107-017-1172-1 Yang, 2020, Wasserstein distributionally robust stochastic control: a data-driven approach, IEEE Trans. Autom. Control, 66, 3863, 10.1109/TAC.2020.3030884 Zakaria, 2020, Uncertainty models for stochastic optimization in renewable energy applications, Renew. Energy, 145, 1543, 10.1016/j.renene.2019.07.081 Harremoës, 1988, Stochastic models for estimation of extreme pollution from urban runoff, Water Res., 22, 1017, 10.1016/0043-1354(88)90149-2 del Giudice, 2015, Comparison of two stochastic techniques for reliable urban runoff prediction by modeling systematic errors, Water Resour. Res., 51, 5004, 10.1002/2014WR016678 Rao, 2002, Control, exploitation and tolerance of intracellular noise, Nature, 420, 231, 10.1038/nature01258 Eling, 2019, Challenges in measuring and understanding biological noise, Nat. Rev. Genet., 20, 536, 10.1038/s41576-019-0130-6 Howard, 1972, Risk-sensitive Markov decision processes, Manag. Sci., 18, 356, 10.1287/mnsc.18.7.356 Jacobson, 1973, Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games, IEEE Trans. Autom. Control, 18, 124, 10.1109/TAC.1973.1100265 Whittle, 1991, A risk-sensitive maximum principle: the case of imperfect state observation, IEEE Trans. Autom. Control, 36, 793, 10.1109/9.85059 di Masi, 1999, Risk-sensitive control of discrete-time Markov processes with infinite horizon, SIAM J. Control Optim., 38, 61, 10.1137/S0363012997320614 Borkar, 2002, Q-learning for risk-sensitive control, Math. Oper. Res., 27, 294, 10.1287/moor.27.2.294.324 Bielecki, 1999, Risk sensitive control of finite state Markov chains in discrete time, with applications to portfolio management, Math. Methods Oper. Res., 50, 167, 10.1007/s001860050094 Cavazos-Cadena, 2011, Discounted approximations for risk-sensitive average criteria in Markov decision chains with finite state space, Math. Oper. Res., 36, 133, 10.1287/moor.1100.0476 Blancas-Rivera, 2020, Discounted approximations in risk-sensitive average Markov cost chains with finite state space, Math. Methods Oper. Res., 91, 241, 10.1007/s00186-019-00689-3 di Masi, 2007, Infinite horizon risk sensitive control of discrete time Markov processes under minorization property, SIAM J. Control Optim., 46, 231, 10.1137/040618631 Jaśkiewicz, 2007, Average optimality for risk-sensitive control with general state space, Ann. Appl. Probab., 17, 654, 10.1214/105051606000000790 Anantharam, 2017, A variational formula for risk-sensitive reward, SIAM J. Control Optim., 55, 961, 10.1137/151002630 Chapman, 2021, Classical risk-averse control for a finite-horizon Borel model, IEEE Contr. Syst. Lett., 6, 1525, 10.1109/LCSYS.2021.3114126 Kreps, 1977, Decision problems with expected utility criteria, II: stationarity, Math. Oper. Res., 2, 266, 10.1287/moor.2.3.266 Chow, 2015, Risk-sensitive and robust decision-making: a CVaR optimization approach, 1522 Pflug, 2016, Time-inconsistent multistage stochastic programs: martingale bounds, Eur. J. Oper. Res., 249, 155, 10.1016/j.ejor.2015.02.033 Chapman, 2021, Toward a scalable upper bound for a CVaR-LQ problem, IEEE Control Syst. Lett., 6, 920, 10.1109/LCSYS.2021.3086842 Chapman, 2019, A risk-sensitive finite-time reachability approach for safety of stochastic dynamic systems, 2958 Asienkiewicz, 2017, A note on a new class of recursive utilities in Markov decision processes, Appl. Math., 44, 149 van Parys, 2015, Distributionally robust control of constrained stochastic systems, IEEE Trans. Autom. Control, 61, 430 Borkar, 2014, Risk-constrained Markov decision processes, IEEE Trans. Autom. Control, 59, 2574, 10.1109/TAC.2014.2309262 Samuelson, 2018, Safety-aware optimal control of stochastic systems using conditional value-at-risk, 6285 Lindemann, 2021, STL robustness risk over discrete-time stochastic processes, 1329 Lindemann, 2021, Reactive and risk-aware control for signal temporal logic, IEEE Trans. Autom. Control Barbosa, 2021, Risk-aware motion planning in partially known environments, 5220 Safaoui, 2020, Control design for risk-based signal temporal logic specifications, IEEE Control Syst. Lett., 4, 1000, 10.1109/LCSYS.2020.2998543 Luce, 1957 Speyer, 1974, Optimization of stochastic linear systems with additive measurement and process noise using exponential performance criteria, IEEE Trans. Autom. Control, 19, 358, 10.1109/TAC.1974.1100606 Başar, 1999, Nash equilibria of risk-sensitive nonlinear stochastic differential games, J. Optim. Theory Appl., 100, 479, 10.1023/A:1022678204735 Moon, 2016, Linear quadratic risk-sensitive and robust mean field games, IEEE Trans. Autom. Control, 62, 1062, 10.1109/TAC.2016.2579264 Moon, 2019, Risk-sensitive mean field games via the stochastic maximum principle, Dyn. Games Appl., 9, 1100, 10.1007/s13235-018-00290-z Saldi, 2020, Approximate Markov-Nash equilibria for discrete-time risk-sensitive mean-field games, Math. Oper. Res., 45, 1596, 10.1287/moor.2019.1044 Björk, 2014, A theory of Markovian time-inconsistent stochastic control in discrete time, Finance Stoch., 18, 545, 10.1007/s00780-014-0234-y Witten, 1977, An adaptive optimal controller for discrete-time Markov environments, Inf. Control, 34, 286, 10.1016/S0019-9958(77)90354-0 Watkins, 1989 Watkins, 1992, Q-learning, Mach. Learn., 8, 279, 10.1007/BF00992698 Mihatsch, 2002, Risk-sensitive reinforcement learning, Mach. Learn., 49, 267, 10.1023/A:1017940631555 Shen, 2014, Risk-sensitive reinforcement learning, Neural Comput., 26, 1298, 10.1162/NECO_a_00600 Huang, 2017, Risk-aware Q-learning for Markov decision processes, 4928 Huang, 2021, Stochastic approximation for risk-aware Markov decision processes, IEEE Trans. Autom. Control, 66, 1314, 10.1109/TAC.2020.2989702 Hanna, 2021, Importance sampling in reinforcement learning with an estimated behavior policy, Mach. Learn., 1 Sastry, 1989 Schneider, 1996, Exploiting model uncertainty estimates for safe dynamic control learning, 1047 Perkins, 2002, Lyapunov design for safe reinforcement learning, J. Mach. Learn. Res., 3, 803