Steering approaches to Pareto-optimal multiobjective reinforcement learning

Neurocomputing - Tập 263 - Trang 26-38 - 2017

Peter Vamplew¹, Rustam Issabekov¹, Richard Dazeley¹, Cameron Foale¹, Adam Berry², Tim Moore², Douglas Creighton³

¹Federation Learning Agents Group, School of Engineering and Information Technology, Federation University Australia, Ballarat, Victoria, Australia

²Energy Technology Division, CSIRO, Mayfield West, NSW, Australia

³Centre for Intelligent Systems Research, Deakin University, Waurn Ponds, Victoria, Australia

Tài liệu tham khảo

Castelletti, 2002, Reinforcement learning in the operational management of a water system, 325 Oksanen, 2012, Reinforcement learning based sensing policy optimization for energy efficient cognitive radio networks, Neurocomputing, 80, 102, 10.1016/j.neucom.2011.07.027 Liu, 2016, Fuzzy approximation-based adaptive backstepping optimal control for a class of nonlinear discrete-time systems with dead-zone, IEEE Transactions on Fuzzy Systems, 24, 16, 10.1109/TFUZZ.2015.2418000 Brys, 2013, On the behaviour of scalarization methods for the engagement of a wet clutch Roijers, 2013, A survey of multi-objective sequential decision-making, J. Artif. Intell. Res., 48, 67, 10.1613/jair.3987 Vamplew, 2015, Reinforcement learning of Pareto-optimal multiobjective policies using steering, 596 Mannor, 2001, The steering approach for multi-criteria reinforcement learning, 1563 Mannor, 2004, A geometric approach to multi-criterion reinforcement learning, J. Mach. Learn. Res., 5, 325 Vamplew, 2011, Empirical evaluation methods for multiobjective reinforcement learning algorithms, Mach. Learn., 84, 51, 10.1007/s10994-010-5232-5 C. Shelton, Importance sampling for reinforcement learning with multiple objectives, 2001, AI Technical Report, number 2001-003, MIT. Chatterjee, 2006, Markov decision processes with multiple objectives, 325 Vamplew, 2009, Constructing stochastic mixture policies for episodic multiobjective reinforcement learning tasks, 340 Parisi, 2014, Policy gradient approaches for multi-objective sequential decision making, 2323 Handa, 2009, Solving multi-objective reinforcement learning problems by EDA-RL—acquisition of various strategies, 426 Soh, 2011, Evolving policies for multi-reward partially observable Markov decision processes (MR-POMDPs), 713 Taylor, 2007, Temporal difference and policy search methods for reinforcement learning: an empirical comparison, vol. 22, 1675 Kalyanakrishnan, 2009, An empirical analysis of value function-based and policy search reinforcement learning, 749 Whiteson, 2010, Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning, Auton. Agents Multiagent Syst., 21, 1, 10.1007/s10458-009-9100-2 Roijers, 2013, Computing convex coverage sets for multi-objective coordination graphs, 309 Karlsson, 1997 Guo, 2009, A reinforcement learning approach to setting multi-objective goals for energy demand management, Int. J. Agent Technol. Syst., 1, 55, 10.4018/jats.2009040104 Ferreira, 2012, Multi-agent multi-objective reinforcement learning using heuristically accelerated reinforcement learning, 14 Vamplew, 2008, On the limitations of scalarisation for multi-objective reinforcement learning of Pareto fronts, 372 Barrett, 2008, Learning all optimal policies with multiple criteria, 41 Moffaert, 2014, A novel adaptive weight selection algorithm for multi-objective multi-agent reinforcement learning, 2306 Raicevic, 2006, Parallel reinforcement learning using multiple reward signals, Neurocomputing, 69, 2171, 10.1016/j.neucom.2005.07.008 Lizotte, 2015, Multi-objective Markov decision processes for decision support Akrour, 2011, Preference-based policy learning, 12 Fürnkranz, 2012, Preference-based reinforcement learning: a formal framework and a policy iteration algorithm, Mach. Learn., 89, 123, 10.1007/s10994-012-5313-8 Brinsmead, 2015, Future energy storage trends: an assessment of the economic viability, potential uptake and impacts of electrical energy storage on the NEM 2015–2035 Cavanagh, 2015, Electrical energy storage: technology overview and applications Sutton, 1996, Generalization in reinforcement learning: successful examples using sparse coarse coding, 1038 Vamplew, 2016, A novel exploration method for multiobjective reinforcement learning, Neurocomputing Precup, 2001, Off-policy temporal-difference learning with function approximation, 417 Sutton, 2009, A convergent O(n) temporal-difference algorithm for off-policy learning with linear function approximation, 1609 Lizotte, 2010, Efficient reinforcement learning with multiple reward functions for randomized clinical trial analysis, 695 Van Moffaert, 2014, Learning sets of Pareto optimal policies Van Moffaert, 2014, Multi-objective reinforcement learning using sets of Pareto dominating policies, J. Mach. Learn. Res., 15, 3483

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver