General value iteration based single network approach for constrained optimal controller design of partially-unknown continuous-time nonlinear systems
Tài liệu tham khảo
Bertsekas, 1995, Vol. 1
Lewis, 2012
Sutton, 1998
Werbos, 1990, 67
Bertsekas, 1996
Silver, 2016, Mastering the game of go with deep neural networks and tree search, Nature, 529, 484, 10.1038/nature16961
Teck-Hou, 2015, Self-organizing neural networks integrating domain knowledge and reinforcement learning, IEEE Trans. Neural Netw. Learn. Syst., 26, 889, 10.1109/TNNLS.2014.2327636
Elfwing, 2016, From free energy to expected energy: improving energy-based value function approximation in reinforcement learning, Neural Netw., 84, 17, 10.1016/j.neunet.2016.07.013
Kamalapurkar, 2016, Model-based reinforcement learning for infinite-horizon approximate optimal tracking, IEEE Trans. Neural Netw. Learn. Syst., 28, 753, 10.1109/TNNLS.2015.2511658
Kamalapurkar, 2016, Model-based reinforcement learning for approximate optimal regulation, Automatica, 64, 94, 10.1016/j.automatica.2015.10.039
Modares, 2016, Optimal model-free output synchronization of heterogeneous systems using off-policy reinforcement learning, Automatica, 71, 334, 10.1016/j.automatica.2016.05.017
Modares, 2016, Optimized assistive human-robot interaction using reinforcement learning, IEEE Trans. Cybern., 46, 655, 10.1109/TCYB.2015.2412554
Tangkaratt, 2016, Model-based reinforcement learning with dimension reduction, Neural Netw., 84, 1, 10.1016/j.neunet.2016.08.005
Wang, 2016, Fault-tolerant controller design for a class of nonlinear MIMO discrete-time systems via online reinforcement learning algorithm, IEEE Trans. Syst. Man Cybern. Syst., 46, 611, 10.1109/TSMC.2015.2478885
P.J. Werbos, Approximate Dynamic Programming for Real-Time Control and Neural Modeling, vol. 15, Van Nostrand Reinhold, pp. 493–525.
Prokhorov, 1997, Adaptive critic designs, IEEE Trans. Neural Netw., 8, 997, 10.1109/72.623201
Powell, 2007
Yu, 2011, Approximate dynamic programming for optimal stationary control with control-dependent noise, IEEE Trans. Neural Netw., 22, 2392, 10.1109/TNN.2011.2165729
Zhang, 2011, Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method, IEEE Trans. Neural Netw., 22, 2226, 10.1109/TNN.2011.2168538
Zhang, 2013
Xiao, 2015, Online optimal control of unknown discrete-time nonlinear systems by using time-based adaptive dynamic programming, Neurocomputing, 165, 163, 10.1016/j.neucom.2015.03.006
Zhen, 2015, Grdhp: a general utility function representation for dual heuristic dynamic programming, IEEE Trans. on Neural Netw. Learn. Syst., 26, 614, 10.1109/TNNLS.2014.2329942
Xiao, 2016, Data-driven optimal tracking control for a class of affine non-linear continuous-time systems with completely unknown dynamics, IET Control Theory Appl., 10, 700, 10.1049/iet-cta.2015.0590
Wei, 2017, Discrete-time deterministic Q-learning: a novel convergence analysis, IEEE Trans. Cybern., 47, 1224, 10.1109/TCYB.2016.2542923
Zhong, 2016, A theoretical foundation of goal representation heuristic dynamic programming, IEEE Trans. Neural Netw. Learn. Syst., 27, 2513, 10.1109/TNNLS.2015.2490698
Abu-Khalaf, 2005, Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach, Automatica, 41, 779, 10.1016/j.automatica.2004.11.034
T. Dierks, S. Jagannathan, Optimal control of affine nonlinear continuous-time systems, in: Proceedings of the 2010 American Control Conference, pp. 1568–1573.
Liu, 2014, Online synchronous approximate optimal learning algorithm for multi-player non-zero-sum games with unknown dynamics, IEEE Trans. Syst. Man Cybern. Syst., 44, 1015, 10.1109/TSMC.2013.2295351
Liu, 2015, Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints, IEEE Trans. Cybern., 45, 1372, 10.1109/TCYB.2015.2417170
Al-Tamimi, 2008, Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof, IEEE Trans. Syst. Man Cybern. Part B Cybern., 38, 943, 10.1109/TSMCB.2008.926614
Abu-Khalaf, 2006, Policy iterations on the Hamilton–Jacobi–Isaacs equation for H-infinite state feedback control with input saturation, IEEE Trans. Autom. Control, 51, 1989, 10.1109/TAC.2006.884959
Cheng, 2007, Fixed-final-time-constrained optimal control of nonlinear systems using neural network HJB approach, IEEE Trans. Neural Netw., 18, 1725, 10.1109/TNN.2007.905848
Modares, 2014, Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems, Automatica, 50, 193, 10.1016/j.automatica.2013.09.043
Modares, 2014, Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning, Automatica, 50, 1780, 10.1016/j.automatica.2014.05.011
Luo, 2015, Reinforcement learning solution for HJB equation arising in constrained optimal control problem, Neural Netw., 71, 150, 10.1016/j.neunet.2015.08.007
Yang, 2016, Online approximate solution of HJI equation for unknown constrained-input nonlinear continuous-time systems, Inf. Sci., 328, 435, 10.1016/j.ins.2015.09.001
Yang, 2016, Data-based robust adaptive control for a class of unknown nonlinear constrained-input systems via integral reinforcement learning, Inf. Sci., 369, 731, 10.1016/j.ins.2016.07.051
Yang, 2013, Neural-network-based online optimal control for uncertain non-linear continuous-time systems with control constraints, IET Control Theory Appl., 7, 2037, 10.1049/iet-cta.2013.0472
Zhang, 2009, Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints, IEEE Trans. Neural Netw., 20, 1490, 10.1109/TNN.2009.2027233
Song, 2010, Optimal control laws for time-delay systems with saturating actuators based on heuristic dynamic programming, Neurocomputing, 73, 3020, 10.1016/j.neucom.2010.07.005
Liu, 2013, An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs, Inf. Sci., 220, 331, 10.1016/j.ins.2012.07.006
Zhang, 2008, A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm, IEEE Trans. Syst. Man Cybern. Part B Cybern., 38, 937, 10.1109/TSMCB.2008.920269
Zhang, 2011, Optimal tracking control for a class of nonlinear discrete-time systems with time delays based on heuristic dynamic programming, IEEE Trans. Neural Netw., 22, 1851, 10.1109/TNN.2011.2172628
Zhang, 2014, Neural-network-based constrained optimal control scheme for discrete-time switched nonlinear system using dual heuristic programming, IEEE Trans. Autom. Sci. Eng., 11, 839, 10.1109/TASE.2014.2303139
Dierks, 2009, Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence, Neural Netw., 22, 851, 10.1016/j.neunet.2009.06.014
Wei, 2012, An iterative ϵ-optimal control scheme for a class of discrete-time nonlinear systems with unfixed initial state, Neural Netw., 32, 236, 10.1016/j.neunet.2012.02.027
Li, 2012, Optimal control for discrete-time affine non-linear systems using general value iteration, IET Control Theory Appl., 6, 2725, 10.1049/iet-cta.2011.0783
Wei, 2016, Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems, IEEE Trans. Cybern., 46, 840, 10.1109/TCYB.2015.2492242
Padhi, 2006, A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems, Neural Netw., 19, 1648, 10.1016/j.neunet.2006.08.010
Zhang, 2012, Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP, IEEE Trans. Syst. Man Cybern. Part B Cybern., 43, 206
Heydari, 2013, Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics, IEEE Trans. Neural Netw. Learn. Syst., 24, 145, 10.1109/TNNLS.2012.2227339
Wang, 2013, Neuro-optimal control for a class of unknown nonlinear dynamic systems using SN-DHP technique, Neurocomputing, 121, 218, 10.1016/j.neucom.2013.04.006
Modares, 2013, A policy iteration approach to online optimal control of continuous-time constrained-input systems, ISA Trans., 52, 611, 10.1016/j.isatra.2013.04.004