General value iteration based single network approach for constrained optimal controller design of partially-unknown continuous-time nonlinear systems

Journal of the Franklin Institute - Tập 355 - Trang 2610-2630 - 2018
Geyang Xiao1,2, Huaguang Zhang1,2, Qiuxia Qu1,2, He Jiang1,2
1College of Information Science and Engineering, Northeastern University, Box 134, Shenyang 110819, PR China
2The Key Laboratory of Integrated Automation of Process Industry (Northeastern University) of the National Education Ministry, Shenyang, 110004, PR China

Tài liệu tham khảo

Bertsekas, 1995, Vol. 1 Lewis, 2012 Sutton, 1998 Werbos, 1990, 67 Bertsekas, 1996 Silver, 2016, Mastering the game of go with deep neural networks and tree search, Nature, 529, 484, 10.1038/nature16961 Teck-Hou, 2015, Self-organizing neural networks integrating domain knowledge and reinforcement learning, IEEE Trans. Neural Netw. Learn. Syst., 26, 889, 10.1109/TNNLS.2014.2327636 Elfwing, 2016, From free energy to expected energy: improving energy-based value function approximation in reinforcement learning, Neural Netw., 84, 17, 10.1016/j.neunet.2016.07.013 Kamalapurkar, 2016, Model-based reinforcement learning for infinite-horizon approximate optimal tracking, IEEE Trans. Neural Netw. Learn. Syst., 28, 753, 10.1109/TNNLS.2015.2511658 Kamalapurkar, 2016, Model-based reinforcement learning for approximate optimal regulation, Automatica, 64, 94, 10.1016/j.automatica.2015.10.039 Modares, 2016, Optimal model-free output synchronization of heterogeneous systems using off-policy reinforcement learning, Automatica, 71, 334, 10.1016/j.automatica.2016.05.017 Modares, 2016, Optimized assistive human-robot interaction using reinforcement learning, IEEE Trans. Cybern., 46, 655, 10.1109/TCYB.2015.2412554 Tangkaratt, 2016, Model-based reinforcement learning with dimension reduction, Neural Netw., 84, 1, 10.1016/j.neunet.2016.08.005 Wang, 2016, Fault-tolerant controller design for a class of nonlinear MIMO discrete-time systems via online reinforcement learning algorithm, IEEE Trans. Syst. Man Cybern. Syst., 46, 611, 10.1109/TSMC.2015.2478885 P.J. Werbos, Approximate Dynamic Programming for Real-Time Control and Neural Modeling, vol.  15, Van Nostrand Reinhold, pp. 493–525. Prokhorov, 1997, Adaptive critic designs, IEEE Trans. Neural Netw., 8, 997, 10.1109/72.623201 Powell, 2007 Yu, 2011, Approximate dynamic programming for optimal stationary control with control-dependent noise, IEEE Trans. Neural Netw., 22, 2392, 10.1109/TNN.2011.2165729 Zhang, 2011, Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method, IEEE Trans. Neural Netw., 22, 2226, 10.1109/TNN.2011.2168538 Zhang, 2013 Xiao, 2015, Online optimal control of unknown discrete-time nonlinear systems by using time-based adaptive dynamic programming, Neurocomputing, 165, 163, 10.1016/j.neucom.2015.03.006 Zhen, 2015, Grdhp: a general utility function representation for dual heuristic dynamic programming, IEEE Trans. on Neural Netw. Learn. Syst., 26, 614, 10.1109/TNNLS.2014.2329942 Xiao, 2016, Data-driven optimal tracking control for a class of affine non-linear continuous-time systems with completely unknown dynamics, IET Control Theory Appl., 10, 700, 10.1049/iet-cta.2015.0590 Wei, 2017, Discrete-time deterministic Q-learning: a novel convergence analysis, IEEE Trans. Cybern., 47, 1224, 10.1109/TCYB.2016.2542923 Zhong, 2016, A theoretical foundation of goal representation heuristic dynamic programming, IEEE Trans. Neural Netw. Learn. Syst., 27, 2513, 10.1109/TNNLS.2015.2490698 Abu-Khalaf, 2005, Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach, Automatica, 41, 779, 10.1016/j.automatica.2004.11.034 T. Dierks, S. Jagannathan, Optimal control of affine nonlinear continuous-time systems, in: Proceedings of the 2010 American Control Conference, pp. 1568–1573. Liu, 2014, Online synchronous approximate optimal learning algorithm for multi-player non-zero-sum games with unknown dynamics, IEEE Trans. Syst. Man Cybern. Syst., 44, 1015, 10.1109/TSMC.2013.2295351 Liu, 2015, Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints, IEEE Trans. Cybern., 45, 1372, 10.1109/TCYB.2015.2417170 Al-Tamimi, 2008, Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof, IEEE Trans. Syst. Man Cybern. Part B Cybern., 38, 943, 10.1109/TSMCB.2008.926614 Abu-Khalaf, 2006, Policy iterations on the Hamilton–Jacobi–Isaacs equation for H-infinite state feedback control with input saturation, IEEE Trans. Autom. Control, 51, 1989, 10.1109/TAC.2006.884959 Cheng, 2007, Fixed-final-time-constrained optimal control of nonlinear systems using neural network HJB approach, IEEE Trans. Neural Netw., 18, 1725, 10.1109/TNN.2007.905848 Modares, 2014, Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems, Automatica, 50, 193, 10.1016/j.automatica.2013.09.043 Modares, 2014, Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning, Automatica, 50, 1780, 10.1016/j.automatica.2014.05.011 Luo, 2015, Reinforcement learning solution for HJB equation arising in constrained optimal control problem, Neural Netw., 71, 150, 10.1016/j.neunet.2015.08.007 Yang, 2016, Online approximate solution of HJI equation for unknown constrained-input nonlinear continuous-time systems, Inf. Sci., 328, 435, 10.1016/j.ins.2015.09.001 Yang, 2016, Data-based robust adaptive control for a class of unknown nonlinear constrained-input systems via integral reinforcement learning, Inf. Sci., 369, 731, 10.1016/j.ins.2016.07.051 Yang, 2013, Neural-network-based online optimal control for uncertain non-linear continuous-time systems with control constraints, IET Control Theory Appl., 7, 2037, 10.1049/iet-cta.2013.0472 Zhang, 2009, Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints, IEEE Trans. Neural Netw., 20, 1490, 10.1109/TNN.2009.2027233 Song, 2010, Optimal control laws for time-delay systems with saturating actuators based on heuristic dynamic programming, Neurocomputing, 73, 3020, 10.1016/j.neucom.2010.07.005 Liu, 2013, An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs, Inf. Sci., 220, 331, 10.1016/j.ins.2012.07.006 Zhang, 2008, A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm, IEEE Trans. Syst. Man Cybern. Part B Cybern., 38, 937, 10.1109/TSMCB.2008.920269 Zhang, 2011, Optimal tracking control for a class of nonlinear discrete-time systems with time delays based on heuristic dynamic programming, IEEE Trans. Neural Netw., 22, 1851, 10.1109/TNN.2011.2172628 Zhang, 2014, Neural-network-based constrained optimal control scheme for discrete-time switched nonlinear system using dual heuristic programming, IEEE Trans. Autom. Sci. Eng., 11, 839, 10.1109/TASE.2014.2303139 Dierks, 2009, Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence, Neural Netw., 22, 851, 10.1016/j.neunet.2009.06.014 Wei, 2012, An iterative ϵ-optimal control scheme for a class of discrete-time nonlinear systems with unfixed initial state, Neural Netw., 32, 236, 10.1016/j.neunet.2012.02.027 Li, 2012, Optimal control for discrete-time affine non-linear systems using general value iteration, IET Control Theory Appl., 6, 2725, 10.1049/iet-cta.2011.0783 Wei, 2016, Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems, IEEE Trans. Cybern., 46, 840, 10.1109/TCYB.2015.2492242 Padhi, 2006, A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems, Neural Netw., 19, 1648, 10.1016/j.neunet.2006.08.010 Zhang, 2012, Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP, IEEE Trans. Syst. Man Cybern. Part B Cybern., 43, 206 Heydari, 2013, Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics, IEEE Trans. Neural Netw. Learn. Syst., 24, 145, 10.1109/TNNLS.2012.2227339 Wang, 2013, Neuro-optimal control for a class of unknown nonlinear dynamic systems using SN-DHP technique, Neurocomputing, 121, 218, 10.1016/j.neucom.2013.04.006 Modares, 2013, A policy iteration approach to online optimal control of continuous-time constrained-input systems, ISA Trans., 52, 611, 10.1016/j.isatra.2013.04.004