Scalable lifelong reinforcement learning

Pattern Recognition - Tập 72 - Trang 407-418 - 2017
Yusen Zhan1, Haitham Bou Ammar2, Matthew E. Taylor1
1The School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99163, USA
2Prowler i.o., Cambridge, United Kingdom

Tài liệu tham khảo

Kober, 2009, Policy search for motor primitives in robotics, 849 Murphy, 2007, Methodological challenges in constructing effective treatment sequences for chronic psychiatric disorders, Neuropsychopharmacology, 32, 257, 10.1038/sj.npp.1301241 Pineau, 2007, Constructing evidence-based treatment strategies using methods from computer science, Drug Alcohol Depend., 88, S52, 10.1016/j.drugalcdep.2007.01.005 Sutton, 1998 Wilson, 2007, Multi-task reinforcement learning: a hierarchical Bayesian approach, 1015 Taylor, 2009, Transfer learning for reinforcement learning domains: a survey, J. Mach. Learn. Res., 10, 1633 Lazaric, 2010, Bayesian multi-task reinforcement learning Li, 2009, Multi-task reinforcement learning in partially observable stochastic environments, J. Mach. Learn. Res., 10, 1131 Bou-Ammar, 2014, Online multi-task learning for policy gradient methods Williams, 1992, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Mach. Learn., 8, 229, 10.1007/BF00992696 Bhatnagar, 2009, Natural actor–critic algorithms, Automatica, 45, 2471, 10.1016/j.automatica.2009.07.008 Peters, 2008, Natural actor-critic, Neurocomputing, 71, 1180, 10.1016/j.neucom.2007.11.026 Ruvolo, 2013, Ella: an efficient lifelong learning algorithm Thrun, 1996, Discovering structure in multiple learning tasks: the TC algorithm Caarls, 2016, Parallel online temporal difference learning for motor control, IEEE Trans. Neural Netw. Learn. Syst., 27, 1457, 10.1109/TNNLS.2015.2442233 S. Gu, E. Holly, T. Lillicrap, S. Levine, Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates, arXiv preprintarXiv:1610.00633 (2016). A. Yahya, A. Li, M. Kalakrishnan, Y. Chebotar, S. Levine, Collective robot reinforcement learning with distributed asynchronous guided policy search, arXiv preprintarXiv:1610.00673(2016). Levine, 2016, End-to-end training of deep visuomotor policies, J. Mach. Learn. Res., 17, 1 Deisenroth, 2014, Multi-task policy search for robotics, 3876 Wilson, 2007, Multi-task reinforcement learning: ahierarchical Bayesian approach Snel, 2014, Learning potential functions and their representations for multi-task reinforcement learning, Auton. Agent Multi Agent Syst., 28, 637, 10.1007/s10458-013-9235-z Kumar, 2012, Learning task grouping and overlap in multi-task learning, 1383 Bou Ammar, 2015, Autonomous cross-domain knowledge transfer in lifelong policy gradient reinforcement learning Boyd, 2011, Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. Trends Mach. Learn., 3, 1, 10.1561/2200000016 Wei, 2012, Distributed alternating direction method of multipliers, 5445 Tibshiranit, 1996, Regression shrinkage and selection via the Lasso, J. R. Stat. Soc. Series B (Methodological), 58, pp.267 Peters, 2008, Natural actor-critic, Neurocomputing, 71, 10.1016/j.neucom.2007.11.026