A Q-learning predictive control scheme with guaranteed stability

European Journal of Control - Tập 56 - Trang 167-178 - 2020
Lukas Beckenbach1, Pavel Osinenko1, Stefan Streif1
1Technische Universität Chemnitz, Automatic Control and System Dynamics Lab, 09107 Chemnitz, Germany

Tài liệu tham khảo

Aswani, 2013, Provably safe and robust learning-based model predictive control, Automatica, 49, 1216, 10.1016/j.automatica.2013.02.003 Barto, 1992, Reinforcement learning and adaptive critic methods Barto, 1995, Reinforcement learning and dynamic programming, IFAC Proc. Vol., 28, 407, 10.1016/S1474-6670(17)45266-9 Barto, 1983, Neuronlike adaptive elements that can solve difficult learning control problems, IEEE Trans. Syst. Man Cybern., SMC-13, 834, 10.1109/TSMC.1983.6313077 Beard, 1996, Improving the performance of stabilizing controls for nonlinear systems, IEEE Control Syst. Mag., 16, 27, 10.1109/37.537206 Beckenbach, 2018, Addressing infinite-horizon optimality in MPC via Q-learning, IFAC-PapersOnLine, 51, 60, 10.1016/j.ifacol.2018.10.175 Bellman, 1957 Berkenkamp, 2017, Safe model-based reinforcement learning with stability guarantees Bertsekas, 1987 Bertsekas, 2005, Dynamic programming and sub-optimal control: a survey from ADP to MPC, Eur. J. Control, 11, 310, 10.3166/ejc.11.310-334 Bertsekas, 2012, II Beuchat, 2020, Performance guarantees for model-based approximate dynamic programming in continuous spaces, IEEE Trans. Automat. Control, 65, 143, 10.1109/TAC.2019.2906423 Boyan, 1999, Least-squares temporal difference learning Bradtke, 1996, Linear least-squares algorithms for temporal difference learning, Mach. Learn., 22, 33, 10.1007/BF00114723 Bradtke, 1994, Adaptive linear quadratic control using policy iteration Chen, 1998, A quasi-infinite horizon nonlinear model predictive control scheme with guaranteed stability, Automatica, 34, 1205, 10.1016/S0005-1098(98)00073-9 Chen, 2000, Model predictive control of nonlinear systems: computational burden and stability, IEE Proc. - Control Theory Appl., 147, 387, 10.1049/ip-cta:20000379 Doyle, 1995, Nonlinear model-based control using second-order volterra models, Automatica, 31, 697, 10.1016/0005-1098(94)00150-H Ernst, 2007, Model predictive control and reinforcement learning as two complementary frameworks, Int. J. Tomogr. Stat., 6, 122 Farahmand, 2010, Error propagation for approximate policy and value iteration, 568 S. Fujimoto, D. Meger, D. Precup, Off-policy deep reinforcement learning without exploration, 2019. Available at arXiv:1812.02900v2 [cs.LG]. Grimm, 2005, Model predictive control: for want of a local control Lyapunov function, all is not lost, IEEE Trans. Autom. Control, 50, 546, 10.1109/TAC.2005.847055 Gros, 2019, Data-driven economic NMPC using Reinforcement Learning, IEEE Trans. Autom. Control (Early Access) Grüne, 2009, Analysis and design of unconstrained nonlinear MPC schemes for finite and infinite dimensional systems, SIAM J. Control Optimiz., 48, 1206, 10.1137/070707853 Grüne, 2012, NMPC without terminal constraints, IFAC Proc. Vol., 45, 1, 10.3182/20120823-5-NL-3013.00030 Grüne, 2017, Nonlinear Model Predictive Control Grüne, 2008, On the infinite horizon performance of receding horizon controllers, IEEE Trans. Autom. Control, 53, 2100, 10.1109/TAC.2008.927799 Grześ, 2010, Online learning of shaping rewards in reinforcement learning, Neural Netw., 23, 541, 10.1016/j.neunet.2010.01.001 Hagen, 1998, Linear quadratic regulation using reinforcement learning, 39 Hertneck, 2018, Learning an approximate model predictive controller with guarantees, IEEE Control Syst. Lett., 2, 543, 10.1109/LCSYS.2018.2843682 A. Heydari, Stabilizing value iteration with and without approximation errors, 2015a. Available at arXiv:1412.5675v2 [cs.SY]. Heydari, 2015, Theoretical and numerical analysis of approximate dynamic programming with approximation errors, J. Guidance, Control Dyn., 39, 301, 10.2514/1.G001154 Heydari, 2014, Global optimality of approximate dynamic programming and its use in non-convex function minimization, Appl. Soft Comp., 24, 291, 10.1016/j.asoc.2014.07.003 Jadbabaie, 2001, Unconstrained receding horizon control with no terminal cost Jadbabaie, 2001, Unconstrained receding-horizon control of nonlinear systems, IEEE Trans. Autom. Control, 46, 776, 10.1109/9.920800 John, 1994, When the best move isn’t optimal: Q-learning with exploration Kaelbling, 1996, Reinforcement learning: a survey, J. Artif. Intell. Res., 4, 237, 10.1613/jair.301 Kiumarsi, 2014, Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics, Automatica, 50, 1167, 10.1016/j.automatica.2014.02.015 Landelius, 1996, Greedy adaptive critics for LQR problems: convergence proofs Lavretsky, 2000, Greedy optimal control Lazaric, 2012, Finite-sample analysis of least-squares policy iteration, J. Mach. Learn. Res., 13, 3041 Lee, 2005, Approximate dynamic programming-based approaches for input-output data-driven control of nonlinear processes, Automatica, 41, 1281, 10.1016/j.automatica.2005.02.006 Lewis, 2009, Reinforcement learning and adaptive dynamic programming for feedback control, IEEE Circuits Syst. Mag., 9, 32, 10.1109/MCAS.2009.933854 Li, 2019, Off-policy interleaved q-learning: optimal control for affine nonlinear discrete-time systems, IEEE Trans. Neural Netw. Learn. Syst., 30, 1308, 10.1109/TNNLS.2018.2861945 T.P. Lillicrap, J.J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra, Continuous control with deep reinforcement learning, 2016. Available at arXiv:1509.02971v5 [cs.LG]. Limon, 2006, On the stability of constrained MPC without terminal constraint, IEEE Trans. Autom. Control, 51, 832, 10.1109/TAC.2006.875014 Lin, 1993 Liu, 2013, Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems, IEEE Trans. Cybern., 43, 779, 10.1109/TSMCB.2012.2216523 Liu, 2014, Policy itereation adaptive dynamic programming algorithm for discrete-time nonlinear systems, IEEE Trans. Neural Netw. Learn. Syst., 25, 621, 10.1109/TNNLS.2013.2281663 Liu, 2000, Convergence analysis of adaptive critic based optimal control Luo, 2016, Model-free optimal tracking control via critic-only Q-learning, IEEE Trans. Neural Netw. Learn. Syst., 27, 2134, 10.1109/TNNLS.2016.2585520 Mayne, 2000, Constrained model predictive control: Stability and optimality, Automatica, 36, 789, 10.1016/S0005-1098(99)00214-9 Mhaskar, 2006, Stabilization of nonlinear systems with state and control constraints using Lyapunov-based predictive control, Syst. Control Lett., 55, 650, 10.1016/j.sysconle.2005.09.014 Ng, 1999, Policy invariance under reward transformations: theory and application to reward shaping de Oliveira Kothare, 2000, Contractive model predictive control for constrained nonlinear systems, IEEE Trans. Autom. Control, 45, 1053, 10.1109/9.863592 Osinenko, 2017, Stacked adaptive dynamic programming with unknown system model, IFAC-PapersOnLine, 50, 4150, 10.1016/j.ifacol.2017.08.803 Ostafew, 2014, Learning-based nonlinear model predictive control to improve vision-based mobile robot path-tracking in challenging outdoor environments de la Peña, 2008, Lyapunov-based model predictive control of nonlinear systems subject to data losses, IEEE Trans. Autom. Control, 53, 2076, 10.1109/TAC.2008.929401 Primbs, 2000, Feasibility and stability of constrained finite receding horizon control, Automatica, 36, 965, 10.1016/S0005-1098(00)00004-2 Rawlings, 2017 Reble, 2012, Unconstrained model predictive control and suboptimality estimates for nonlinear continuous-time systems, Automatica, 48, 1812, 10.1016/j.automatica.2012.05.067 Recht, 2019, A tour of reinforcement learning: The view from continuous control, Annual Rev. of Control, Robotics, and Autonom. Syst., 2, 253, 10.1146/annurev-control-053018-023825 Rummery, 1994, 37 Salvador, 2018, Data-based predictive control via direct weight optimization, IFAC-PapersOnLine, 51, 356, 10.1016/j.ifacol.2018.11.059 Sarangapani, 2006, Neural network control of nonlinear discrete-time systems Singh, 2000, Convergence results for single-step on-policy reinforcement-learning algorithms, Mach. Learn., 38, 287, 10.1023/A:1007678930559 Sokolov, 2013, Improved stability criteria of adp control for efficient context-aware decision support systems Sokolov, 2015, Complete stability analysis of a heuristic approximate dynamic programming control design, Automatica, 59, 9, 10.1016/j.automatica.2015.06.001 Soloperto, 2018, Learning-based robust model predictive control with state-dependent uncertainty, IFAC-PapersOnLine, 51, 442, 10.1016/j.ifacol.2018.11.052 Sutton, 1988, Learning to predict by the methods of temporal differences, Mach. Learn., 3, 9, 10.1007/BF00115009 Sutton, 2000, Policy gradient methods for reinforcement learning with function approximation, 1057 Al Tamimi, 2007, Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control, Automatica, 43, 473, 10.1016/j.automatica.2006.09.019 Al Tamimi, 2008, Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof, IEEE Trans. Syst., Man, and Cyb., Part B (Cybernetics), 38, 943, 10.1109/TSMCB.2008.926614 Tsitsiklis, 1997, An analysis of temporal difference learning with function approximation, IEEE Trans. Autom. Control, 42, 674, 10.1109/9.580874 Tuna, 2006, Shorter horizons for model predictive control Vamvoudakis, 2010, Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem, Automatica, 46, 878, 10.1016/j.automatica.2010.02.018 Wang, 2012, Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming, Automatica, 48, 1825, 10.1016/j.automatica.2012.05.049 Wang, 2009, Adaptive dynamic programming: an introduction, IEEE Comput. Intell. Mag., 4, 39, 10.1109/MCI.2009.932261 Watkins, 1992, Q-learning, Mach. Learn., 8, 279, 10.1007/BF00992698 Wei, 2017, Discrete-time deterministic Q-learning: a novel convergence analysis, IEEE Trans. Cybern., 47, 1224, 10.1109/TCYB.2016.2542923 Xu, 2002, Efficient reinforcement learning using recursive least-squares methods, J. Artif. Intell. Res., 16, 259, 10.1613/jair.946 Zanon, 2019, Practical Reinforcement Learning of Stabilizing Economic MPC Zhang, 2013, Adaptive dynamic programming for control, 10.1007/978-1-4471-4757-2