Scalable lifelong reinforcement learning
Tài liệu tham khảo
Kober, 2009, Policy search for motor primitives in robotics, 849
Murphy, 2007, Methodological challenges in constructing effective treatment sequences for chronic psychiatric disorders, Neuropsychopharmacology, 32, 257, 10.1038/sj.npp.1301241
Pineau, 2007, Constructing evidence-based treatment strategies using methods from computer science, Drug Alcohol Depend., 88, S52, 10.1016/j.drugalcdep.2007.01.005
Sutton, 1998
Wilson, 2007, Multi-task reinforcement learning: a hierarchical Bayesian approach, 1015
Taylor, 2009, Transfer learning for reinforcement learning domains: a survey, J. Mach. Learn. Res., 10, 1633
Lazaric, 2010, Bayesian multi-task reinforcement learning
Li, 2009, Multi-task reinforcement learning in partially observable stochastic environments, J. Mach. Learn. Res., 10, 1131
Bou-Ammar, 2014, Online multi-task learning for policy gradient methods
Williams, 1992, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Mach. Learn., 8, 229, 10.1007/BF00992696
Bhatnagar, 2009, Natural actor–critic algorithms, Automatica, 45, 2471, 10.1016/j.automatica.2009.07.008
Peters, 2008, Natural actor-critic, Neurocomputing, 71, 1180, 10.1016/j.neucom.2007.11.026
Ruvolo, 2013, Ella: an efficient lifelong learning algorithm
Thrun, 1996, Discovering structure in multiple learning tasks: the TC algorithm
Caarls, 2016, Parallel online temporal difference learning for motor control, IEEE Trans. Neural Netw. Learn. Syst., 27, 1457, 10.1109/TNNLS.2015.2442233
S. Gu, E. Holly, T. Lillicrap, S. Levine, Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates, arXiv preprintarXiv:1610.00633 (2016).
A. Yahya, A. Li, M. Kalakrishnan, Y. Chebotar, S. Levine, Collective robot reinforcement learning with distributed asynchronous guided policy search, arXiv preprintarXiv:1610.00673(2016).
Levine, 2016, End-to-end training of deep visuomotor policies, J. Mach. Learn. Res., 17, 1
Deisenroth, 2014, Multi-task policy search for robotics, 3876
Wilson, 2007, Multi-task reinforcement learning: ahierarchical Bayesian approach
Snel, 2014, Learning potential functions and their representations for multi-task reinforcement learning, Auton. Agent Multi Agent Syst., 28, 637, 10.1007/s10458-013-9235-z
Kumar, 2012, Learning task grouping and overlap in multi-task learning, 1383
Bou Ammar, 2015, Autonomous cross-domain knowledge transfer in lifelong policy gradient reinforcement learning
Boyd, 2011, Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. Trends Mach. Learn., 3, 1, 10.1561/2200000016
Wei, 2012, Distributed alternating direction method of multipliers, 5445
Tibshiranit, 1996, Regression shrinkage and selection via the Lasso, J. R. Stat. Soc. Series B (Methodological), 58, pp.267
Peters, 2008, Natural actor-critic, Neurocomputing, 71, 10.1016/j.neucom.2007.11.026