A Q-learning predictive control scheme with guaranteed stability
Tài liệu tham khảo
Aswani, 2013, Provably safe and robust learning-based model predictive control, Automatica, 49, 1216, 10.1016/j.automatica.2013.02.003
Barto, 1992, Reinforcement learning and adaptive critic methods
Barto, 1995, Reinforcement learning and dynamic programming, IFAC Proc. Vol., 28, 407, 10.1016/S1474-6670(17)45266-9
Barto, 1983, Neuronlike adaptive elements that can solve difficult learning control problems, IEEE Trans. Syst. Man Cybern., SMC-13, 834, 10.1109/TSMC.1983.6313077
Beard, 1996, Improving the performance of stabilizing controls for nonlinear systems, IEEE Control Syst. Mag., 16, 27, 10.1109/37.537206
Beckenbach, 2018, Addressing infinite-horizon optimality in MPC via Q-learning, IFAC-PapersOnLine, 51, 60, 10.1016/j.ifacol.2018.10.175
Bellman, 1957
Berkenkamp, 2017, Safe model-based reinforcement learning with stability guarantees
Bertsekas, 1987
Bertsekas, 2005, Dynamic programming and sub-optimal control: a survey from ADP to MPC, Eur. J. Control, 11, 310, 10.3166/ejc.11.310-334
Bertsekas, 2012, II
Beuchat, 2020, Performance guarantees for model-based approximate dynamic programming in continuous spaces, IEEE Trans. Automat. Control, 65, 143, 10.1109/TAC.2019.2906423
Boyan, 1999, Least-squares temporal difference learning
Bradtke, 1996, Linear least-squares algorithms for temporal difference learning, Mach. Learn., 22, 33, 10.1007/BF00114723
Bradtke, 1994, Adaptive linear quadratic control using policy iteration
Chen, 1998, A quasi-infinite horizon nonlinear model predictive control scheme with guaranteed stability, Automatica, 34, 1205, 10.1016/S0005-1098(98)00073-9
Chen, 2000, Model predictive control of nonlinear systems: computational burden and stability, IEE Proc. - Control Theory Appl., 147, 387, 10.1049/ip-cta:20000379
Doyle, 1995, Nonlinear model-based control using second-order volterra models, Automatica, 31, 697, 10.1016/0005-1098(94)00150-H
Ernst, 2007, Model predictive control and reinforcement learning as two complementary frameworks, Int. J. Tomogr. Stat., 6, 122
Farahmand, 2010, Error propagation for approximate policy and value iteration, 568
S. Fujimoto, D. Meger, D. Precup, Off-policy deep reinforcement learning without exploration, 2019. Available at arXiv:1812.02900v2 [cs.LG].
Grimm, 2005, Model predictive control: for want of a local control Lyapunov function, all is not lost, IEEE Trans. Autom. Control, 50, 546, 10.1109/TAC.2005.847055
Gros, 2019, Data-driven economic NMPC using Reinforcement Learning, IEEE Trans. Autom. Control (Early Access)
Grüne, 2009, Analysis and design of unconstrained nonlinear MPC schemes for finite and infinite dimensional systems, SIAM J. Control Optimiz., 48, 1206, 10.1137/070707853
Grüne, 2012, NMPC without terminal constraints, IFAC Proc. Vol., 45, 1, 10.3182/20120823-5-NL-3013.00030
Grüne, 2017, Nonlinear Model Predictive Control
Grüne, 2008, On the infinite horizon performance of receding horizon controllers, IEEE Trans. Autom. Control, 53, 2100, 10.1109/TAC.2008.927799
Grześ, 2010, Online learning of shaping rewards in reinforcement learning, Neural Netw., 23, 541, 10.1016/j.neunet.2010.01.001
Hagen, 1998, Linear quadratic regulation using reinforcement learning, 39
Hertneck, 2018, Learning an approximate model predictive controller with guarantees, IEEE Control Syst. Lett., 2, 543, 10.1109/LCSYS.2018.2843682
A. Heydari, Stabilizing value iteration with and without approximation errors, 2015a. Available at arXiv:1412.5675v2 [cs.SY].
Heydari, 2015, Theoretical and numerical analysis of approximate dynamic programming with approximation errors, J. Guidance, Control Dyn., 39, 301, 10.2514/1.G001154
Heydari, 2014, Global optimality of approximate dynamic programming and its use in non-convex function minimization, Appl. Soft Comp., 24, 291, 10.1016/j.asoc.2014.07.003
Jadbabaie, 2001, Unconstrained receding horizon control with no terminal cost
Jadbabaie, 2001, Unconstrained receding-horizon control of nonlinear systems, IEEE Trans. Autom. Control, 46, 776, 10.1109/9.920800
John, 1994, When the best move isn’t optimal: Q-learning with exploration
Kaelbling, 1996, Reinforcement learning: a survey, J. Artif. Intell. Res., 4, 237, 10.1613/jair.301
Kiumarsi, 2014, Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics, Automatica, 50, 1167, 10.1016/j.automatica.2014.02.015
Landelius, 1996, Greedy adaptive critics for LQR problems: convergence proofs
Lavretsky, 2000, Greedy optimal control
Lazaric, 2012, Finite-sample analysis of least-squares policy iteration, J. Mach. Learn. Res., 13, 3041
Lee, 2005, Approximate dynamic programming-based approaches for input-output data-driven control of nonlinear processes, Automatica, 41, 1281, 10.1016/j.automatica.2005.02.006
Lewis, 2009, Reinforcement learning and adaptive dynamic programming for feedback control, IEEE Circuits Syst. Mag., 9, 32, 10.1109/MCAS.2009.933854
Li, 2019, Off-policy interleaved q-learning: optimal control for affine nonlinear discrete-time systems, IEEE Trans. Neural Netw. Learn. Syst., 30, 1308, 10.1109/TNNLS.2018.2861945
T.P. Lillicrap, J.J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra, Continuous control with deep reinforcement learning, 2016. Available at arXiv:1509.02971v5 [cs.LG].
Limon, 2006, On the stability of constrained MPC without terminal constraint, IEEE Trans. Autom. Control, 51, 832, 10.1109/TAC.2006.875014
Lin, 1993
Liu, 2013, Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems, IEEE Trans. Cybern., 43, 779, 10.1109/TSMCB.2012.2216523
Liu, 2014, Policy itereation adaptive dynamic programming algorithm for discrete-time nonlinear systems, IEEE Trans. Neural Netw. Learn. Syst., 25, 621, 10.1109/TNNLS.2013.2281663
Liu, 2000, Convergence analysis of adaptive critic based optimal control
Luo, 2016, Model-free optimal tracking control via critic-only Q-learning, IEEE Trans. Neural Netw. Learn. Syst., 27, 2134, 10.1109/TNNLS.2016.2585520
Mayne, 2000, Constrained model predictive control: Stability and optimality, Automatica, 36, 789, 10.1016/S0005-1098(99)00214-9
Mhaskar, 2006, Stabilization of nonlinear systems with state and control constraints using Lyapunov-based predictive control, Syst. Control Lett., 55, 650, 10.1016/j.sysconle.2005.09.014
Ng, 1999, Policy invariance under reward transformations: theory and application to reward shaping
de Oliveira Kothare, 2000, Contractive model predictive control for constrained nonlinear systems, IEEE Trans. Autom. Control, 45, 1053, 10.1109/9.863592
Osinenko, 2017, Stacked adaptive dynamic programming with unknown system model, IFAC-PapersOnLine, 50, 4150, 10.1016/j.ifacol.2017.08.803
Ostafew, 2014, Learning-based nonlinear model predictive control to improve vision-based mobile robot path-tracking in challenging outdoor environments
de la Peña, 2008, Lyapunov-based model predictive control of nonlinear systems subject to data losses, IEEE Trans. Autom. Control, 53, 2076, 10.1109/TAC.2008.929401
Primbs, 2000, Feasibility and stability of constrained finite receding horizon control, Automatica, 36, 965, 10.1016/S0005-1098(00)00004-2
Rawlings, 2017
Reble, 2012, Unconstrained model predictive control and suboptimality estimates for nonlinear continuous-time systems, Automatica, 48, 1812, 10.1016/j.automatica.2012.05.067
Recht, 2019, A tour of reinforcement learning: The view from continuous control, Annual Rev. of Control, Robotics, and Autonom. Syst., 2, 253, 10.1146/annurev-control-053018-023825
Rummery, 1994, 37
Salvador, 2018, Data-based predictive control via direct weight optimization, IFAC-PapersOnLine, 51, 356, 10.1016/j.ifacol.2018.11.059
Sarangapani, 2006, Neural network control of nonlinear discrete-time systems
Singh, 2000, Convergence results for single-step on-policy reinforcement-learning algorithms, Mach. Learn., 38, 287, 10.1023/A:1007678930559
Sokolov, 2013, Improved stability criteria of adp control for efficient context-aware decision support systems
Sokolov, 2015, Complete stability analysis of a heuristic approximate dynamic programming control design, Automatica, 59, 9, 10.1016/j.automatica.2015.06.001
Soloperto, 2018, Learning-based robust model predictive control with state-dependent uncertainty, IFAC-PapersOnLine, 51, 442, 10.1016/j.ifacol.2018.11.052
Sutton, 1988, Learning to predict by the methods of temporal differences, Mach. Learn., 3, 9, 10.1007/BF00115009
Sutton, 2000, Policy gradient methods for reinforcement learning with function approximation, 1057
Al Tamimi, 2007, Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control, Automatica, 43, 473, 10.1016/j.automatica.2006.09.019
Al Tamimi, 2008, Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof, IEEE Trans. Syst., Man, and Cyb., Part B (Cybernetics), 38, 943, 10.1109/TSMCB.2008.926614
Tsitsiklis, 1997, An analysis of temporal difference learning with function approximation, IEEE Trans. Autom. Control, 42, 674, 10.1109/9.580874
Tuna, 2006, Shorter horizons for model predictive control
Vamvoudakis, 2010, Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem, Automatica, 46, 878, 10.1016/j.automatica.2010.02.018
Wang, 2012, Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming, Automatica, 48, 1825, 10.1016/j.automatica.2012.05.049
Wang, 2009, Adaptive dynamic programming: an introduction, IEEE Comput. Intell. Mag., 4, 39, 10.1109/MCI.2009.932261
Watkins, 1992, Q-learning, Mach. Learn., 8, 279, 10.1007/BF00992698
Wei, 2017, Discrete-time deterministic Q-learning: a novel convergence analysis, IEEE Trans. Cybern., 47, 1224, 10.1109/TCYB.2016.2542923
Xu, 2002, Efficient reinforcement learning using recursive least-squares methods, J. Artif. Intell. Res., 16, 259, 10.1613/jair.946
Zanon, 2019, Practical Reinforcement Learning of Stabilizing Economic MPC
Zhang, 2013, Adaptive dynamic programming for control, 10.1007/978-1-4471-4757-2