Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences
Tóm tắt
Reward functions are a common way to specify the objective of a robot. As designing reward functions can be extremely challenging, a more promising approach is to directly learn reward functions from human teachers. Importantly, data from human teachers can be collected either passively or actively in a variety of forms: passive data sources include demonstrations (e.g., kinesthetic guidance), whereas preferences (e.g., comparative rankings) are actively elicited. Prior research has independently applied reward learning to these different data sources. However, there exist many domains where multiple sources are complementary and expressive. Motivated by this general problem, we present a framework to integrate multiple sources of information, which are either passively or actively collected from human users. In particular, we present an algorithm that first utilizes user demonstrations to initialize a belief about the reward function, and then actively probes the user with preference queries to zero-in on their true reward. This algorithm not only enables us combine multiple data sources, but it also informs the robot when it should leverage each type of information. Further, our approach accounts for the human’s ability to provide data: yielding user-friendly preference queries which are also theoretically optimal. Our extensive simulated experiments and user studies on a Fetch mobile manipulator demonstrate the superiority and the usability of our integrated framework.
Từ khóa
Tài liệu tham khảo
Ailon N, 2012, Journal of Machine Learning Research, 13, 137
Bajcsy A, 2017, Proceedings of Machine Learning Research, 78, 217
Ben-Akiva ME, 1985, Discrete Choice Analysis: Theory and Application to Travel Demand ( Transportation Studies Series, 9
Biyik E, 2019, Proceedings of the 3rd Conference on Robot Learning (CoRL)
Biyik E, 2018, Conference on Robot Learning (CoRL)
Byk E, 2019, arXiv preprint arXiv:1906.07975
Bobu A, 2018, Conference on Robot Learning, 796
Brockman G, 2016, arXiv preprint arXiv:1606.01540
Brown D, 2019, International Conference on Machine Learning, 783
Brown DS, 2020, Conference on Robot Learning, 330
Brown DS, 2019, Workshop on Safety and Robustness in Decision Making at the 33rd Conference on Neural Information Processing Systems (NeurIPS) 2019
Chen L, 2020, Conference on Robot Learning
Christiano PF, 2017, Advances in Neural Information Processing Systems, 4299
Chu W, 2005, Journal of Machine Learning Research, 6, 1019
Cover TM, 2012, Elements of Information Theory
Guo S, 2010, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 289
Habibian S, 2021, arXiv preprint arXiv:2107.01995
Holladay R, 2016, RSS Workshop on Model Learning for Human–Robot Communication
Ibarz B, 2018, Advances in Neural Information Processing Systems, 8011
Javdani S, 2015, Robotics Science and Systems: Online Proceedings
Katz S, 2021, arXiv preprint arXiv:2103.02727
Li K, 2021, International Conference on Robotics and Automation (ICRA)
Li M, 2021, International Conference on Robotics and Automation (ICRA)
Lucas CG, 2009, Advances in Neural Information Processing Systems, 985
Luce RD, 2012, Individual Choice Behavior: A Theoretical Analysis
Ng AY, 2000, International Conference on Machine Learning, 1, 2
Park D, 2020, Proceedings of the Conference on Robot Learning (Proceedings of Machine Learning Research, 100, 1005
Ramachandran D, 2007, International Joint Conference on Artificial Intelligence, 7, 2586
Schulman J, 2017, arXiv preprint arXiv:1707.06347
Shah A, 2020, arXiv preprint arXiv:2003.02232
Viappiani P, 2010, Advances in Neural Information Processing Systems, 2352
Wise M, 2016, Workshop on Autonomous Mobile Service Robots
Ziebart BD, 2008, Proceedings of the AAAI, 8, 1433