Steering approaches to Pareto-optimal multiobjective reinforcement learning

Neurocomputing - Tập 263 - Trang 26-38 - 2017
Peter Vamplew1, Rustam Issabekov1, Richard Dazeley1, Cameron Foale1, Adam Berry2, Tim Moore2, Douglas Creighton3
1Federation Learning Agents Group, School of Engineering and Information Technology, Federation University Australia, Ballarat, Victoria, Australia
2Energy Technology Division, CSIRO, Mayfield West, NSW, Australia
3Centre for Intelligent Systems Research, Deakin University, Waurn Ponds, Victoria, Australia

Tài liệu tham khảo

Castelletti, 2002, Reinforcement learning in the operational management of a water system, 325 Oksanen, 2012, Reinforcement learning based sensing policy optimization for energy efficient cognitive radio networks, Neurocomputing, 80, 102, 10.1016/j.neucom.2011.07.027 Liu, 2016, Fuzzy approximation-based adaptive backstepping optimal control for a class of nonlinear discrete-time systems with dead-zone, IEEE Transactions on Fuzzy Systems, 24, 16, 10.1109/TFUZZ.2015.2418000 Brys, 2013, On the behaviour of scalarization methods for the engagement of a wet clutch Roijers, 2013, A survey of multi-objective sequential decision-making, J. Artif. Intell. Res., 48, 67, 10.1613/jair.3987 Vamplew, 2015, Reinforcement learning of Pareto-optimal multiobjective policies using steering, 596 Mannor, 2001, The steering approach for multi-criteria reinforcement learning, 1563 Mannor, 2004, A geometric approach to multi-criterion reinforcement learning, J. Mach. Learn. Res., 5, 325 Vamplew, 2011, Empirical evaluation methods for multiobjective reinforcement learning algorithms, Mach. Learn., 84, 51, 10.1007/s10994-010-5232-5 C. Shelton, Importance sampling for reinforcement learning with multiple objectives, 2001, AI Technical Report, number 2001-003, MIT. Chatterjee, 2006, Markov decision processes with multiple objectives, 325 Vamplew, 2009, Constructing stochastic mixture policies for episodic multiobjective reinforcement learning tasks, 340 Parisi, 2014, Policy gradient approaches for multi-objective sequential decision making, 2323 Handa, 2009, Solving multi-objective reinforcement learning problems by EDA-RL—acquisition of various strategies, 426 Soh, 2011, Evolving policies for multi-reward partially observable Markov decision processes (MR-POMDPs), 713 Taylor, 2007, Temporal difference and policy search methods for reinforcement learning: an empirical comparison, vol. 22, 1675 Kalyanakrishnan, 2009, An empirical analysis of value function-based and policy search reinforcement learning, 749 Whiteson, 2010, Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning, Auton. Agents Multiagent Syst., 21, 1, 10.1007/s10458-009-9100-2 Roijers, 2013, Computing convex coverage sets for multi-objective coordination graphs, 309 Karlsson, 1997 Guo, 2009, A reinforcement learning approach to setting multi-objective goals for energy demand management, Int. J. Agent Technol. Syst., 1, 55, 10.4018/jats.2009040104 Ferreira, 2012, Multi-agent multi-objective reinforcement learning using heuristically accelerated reinforcement learning, 14 Vamplew, 2008, On the limitations of scalarisation for multi-objective reinforcement learning of Pareto fronts, 372 Barrett, 2008, Learning all optimal policies with multiple criteria, 41 Moffaert, 2014, A novel adaptive weight selection algorithm for multi-objective multi-agent reinforcement learning, 2306 Raicevic, 2006, Parallel reinforcement learning using multiple reward signals, Neurocomputing, 69, 2171, 10.1016/j.neucom.2005.07.008 Lizotte, 2015, Multi-objective Markov decision processes for decision support Akrour, 2011, Preference-based policy learning, 12 Fürnkranz, 2012, Preference-based reinforcement learning: a formal framework and a policy iteration algorithm, Mach. Learn., 89, 123, 10.1007/s10994-012-5313-8 Brinsmead, 2015, Future energy storage trends: an assessment of the economic viability, potential uptake and impacts of electrical energy storage on the NEM 2015–2035 Cavanagh, 2015, Electrical energy storage: technology overview and applications Sutton, 1996, Generalization in reinforcement learning: successful examples using sparse coarse coding, 1038 Vamplew, 2016, A novel exploration method for multiobjective reinforcement learning, Neurocomputing Precup, 2001, Off-policy temporal-difference learning with function approximation, 417 Sutton, 2009, A convergent O(n) temporal-difference algorithm for off-policy learning with linear function approximation, 1609 Lizotte, 2010, Efficient reinforcement learning with multiple reward functions for randomized clinical trial analysis, 695 Van Moffaert, 2014, Learning sets of Pareto optimal policies Van Moffaert, 2014, Multi-objective reinforcement learning using sets of Pareto dominating policies, J. Mach. Learn. Res., 15, 3483