Steering approaches to Pareto-optimal multiobjective reinforcement learning
Tài liệu tham khảo
Castelletti, 2002, Reinforcement learning in the operational management of a water system, 325
Oksanen, 2012, Reinforcement learning based sensing policy optimization for energy efficient cognitive radio networks, Neurocomputing, 80, 102, 10.1016/j.neucom.2011.07.027
Liu, 2016, Fuzzy approximation-based adaptive backstepping optimal control for a class of nonlinear discrete-time systems with dead-zone, IEEE Transactions on Fuzzy Systems, 24, 16, 10.1109/TFUZZ.2015.2418000
Brys, 2013, On the behaviour of scalarization methods for the engagement of a wet clutch
Roijers, 2013, A survey of multi-objective sequential decision-making, J. Artif. Intell. Res., 48, 67, 10.1613/jair.3987
Vamplew, 2015, Reinforcement learning of Pareto-optimal multiobjective policies using steering, 596
Mannor, 2001, The steering approach for multi-criteria reinforcement learning, 1563
Mannor, 2004, A geometric approach to multi-criterion reinforcement learning, J. Mach. Learn. Res., 5, 325
Vamplew, 2011, Empirical evaluation methods for multiobjective reinforcement learning algorithms, Mach. Learn., 84, 51, 10.1007/s10994-010-5232-5
C. Shelton, Importance sampling for reinforcement learning with multiple objectives, 2001, AI Technical Report, number 2001-003, MIT.
Chatterjee, 2006, Markov decision processes with multiple objectives, 325
Vamplew, 2009, Constructing stochastic mixture policies for episodic multiobjective reinforcement learning tasks, 340
Parisi, 2014, Policy gradient approaches for multi-objective sequential decision making, 2323
Handa, 2009, Solving multi-objective reinforcement learning problems by EDA-RL—acquisition of various strategies, 426
Soh, 2011, Evolving policies for multi-reward partially observable Markov decision processes (MR-POMDPs), 713
Taylor, 2007, Temporal difference and policy search methods for reinforcement learning: an empirical comparison, vol. 22, 1675
Kalyanakrishnan, 2009, An empirical analysis of value function-based and policy search reinforcement learning, 749
Whiteson, 2010, Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning, Auton. Agents Multiagent Syst., 21, 1, 10.1007/s10458-009-9100-2
Roijers, 2013, Computing convex coverage sets for multi-objective coordination graphs, 309
Karlsson, 1997
Guo, 2009, A reinforcement learning approach to setting multi-objective goals for energy demand management, Int. J. Agent Technol. Syst., 1, 55, 10.4018/jats.2009040104
Ferreira, 2012, Multi-agent multi-objective reinforcement learning using heuristically accelerated reinforcement learning, 14
Vamplew, 2008, On the limitations of scalarisation for multi-objective reinforcement learning of Pareto fronts, 372
Barrett, 2008, Learning all optimal policies with multiple criteria, 41
Moffaert, 2014, A novel adaptive weight selection algorithm for multi-objective multi-agent reinforcement learning, 2306
Raicevic, 2006, Parallel reinforcement learning using multiple reward signals, Neurocomputing, 69, 2171, 10.1016/j.neucom.2005.07.008
Lizotte, 2015, Multi-objective Markov decision processes for decision support
Akrour, 2011, Preference-based policy learning, 12
Fürnkranz, 2012, Preference-based reinforcement learning: a formal framework and a policy iteration algorithm, Mach. Learn., 89, 123, 10.1007/s10994-012-5313-8
Brinsmead, 2015, Future energy storage trends: an assessment of the economic viability, potential uptake and impacts of electrical energy storage on the NEM 2015–2035
Cavanagh, 2015, Electrical energy storage: technology overview and applications
Sutton, 1996, Generalization in reinforcement learning: successful examples using sparse coarse coding, 1038
Vamplew, 2016, A novel exploration method for multiobjective reinforcement learning, Neurocomputing
Precup, 2001, Off-policy temporal-difference learning with function approximation, 417
Sutton, 2009, A convergent O(n) temporal-difference algorithm for off-policy learning with linear function approximation, 1609
Lizotte, 2010, Efficient reinforcement learning with multiple reward functions for randomized clinical trial analysis, 695
Van Moffaert, 2014, Learning sets of Pareto optimal policies
Van Moffaert, 2014, Multi-objective reinforcement learning using sets of Pareto dominating policies, J. Mach. Learn. Res., 15, 3483