Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences

International Journal of Robotics Research - Tập 41 Số 1 - Trang 45-67 - 2022
Erdem Bıyık1, Dylan P. Losey2, Malayandi Palan2, Nicholas C. Landolfi2, Gleb Shevchuk2, Dorsa Sadigh2,1
1Department of Electrical Engineering, Stanford University, Stanford, CA, USA
2Department of Computer Science, Stanford University, Stanford, CA, USA

Tóm tắt

Reward functions are a common way to specify the objective of a robot. As designing reward functions can be extremely challenging, a more promising approach is to directly learn reward functions from human teachers. Importantly, data from human teachers can be collected either passively or actively in a variety of forms: passive data sources include demonstrations (e.g., kinesthetic guidance), whereas preferences (e.g., comparative rankings) are actively elicited. Prior research has independently applied reward learning to these different data sources. However, there exist many domains where multiple sources are complementary and expressive. Motivated by this general problem, we present a framework to integrate multiple sources of information, which are either passively or actively collected from human users. In particular, we present an algorithm that first utilizes user demonstrations to initialize a belief about the reward function, and then actively probes the user with preference queries to zero-in on their true reward. This algorithm not only enables us combine multiple data sources, but it also informs the robot when it should leverage each type of information. Further, our approach accounts for the human’s ability to provide data: yielding user-friendly preference queries which are also theoretically optimal. Our extensive simulated experiments and user studies on a Fetch mobile manipulator demonstrate the superiority and the usability of our integrated framework.

Từ khóa


Tài liệu tham khảo

10.1145/1015330.1015430

10.1145/1102351.1102352

Ailon N, 2012, Journal of Machine Learning Research, 13, 137

10.1007/s12369-012-0160-0

10.1007/978-3-642-33486-3_8

10.1145/3171221.3171267

Bajcsy A, 2017, Proceedings of Machine Learning Research, 78, 217

10.1109/IROS40897.2019.8968522

10.1145/2909824.3020250

Ben-Akiva ME, 1985, Discrete Choice Analysis: Theory and Application to Travel Demand ( Transportation Studies Series, 9

10.15607/RSS.2020.XVI.041

10.1109/CDC40024.2019.9030169

Biyik E, 2019, Proceedings of the 3rd Conference on Robot Learning (CoRL)

Biyik E, 2018, Conference on Robot Learning (CoRL)

Byk E, 2019, arXiv preprint arXiv:1906.07975

Bobu A, 2018, Conference on Robot Learning, 796

Brockman G, 2016, arXiv preprint arXiv:1606.01540

Brown D, 2019, International Conference on Machine Learning, 783

Brown DS, 2020, Conference on Robot Learning, 330

Brown DS, 2019, Workshop on Safety and Robustness in Decision Making at the 33rd Conference on Neural Information Processing Systems (NeurIPS) 2019

10.1109/IROS.2011.6094735

Chen L, 2020, Conference on Robot Learning

10.1109/HRI.2019.8673256

Christiano PF, 2017, Advances in Neural Information Processing Systems, 4299

Chu W, 2005, Journal of Machine Learning Research, 6, 1019

Cover TM, 2012, Elements of Information Theory

10.1038/nature04766

10.15607/RSS.2012.VIII.010

Guo S, 2010, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 289

Habibian S, 2021, arXiv preprint arXiv:2107.01995

Holladay R, 2016, RSS Workshop on Model Learning for Human–Robot Communication

Ibarz B, 2018, Advances in Neural Information Processing Systems, 8011

Javdani S, 2015, Robotics Science and Systems: Online Proceedings

Katz S, 2021, arXiv preprint arXiv:2103.02727

10.1109/DASC43569.2019.9081648

10.1162/PRES_a_00223

10.1287/mnsc.23.11.1224

10.1145/2556288.2557238

10.1145/3319502.3374832

10.2514/1.I010363

Li K, 2021, International Conference on Robotics and Automation (ICRA)

Li M, 2021, International Conference on Robotics and Automation (ICRA)

Lucas CG, 2009, Advances in Neural Information Processing Systems, 985

Luce RD, 2012, Individual Choice Behavior: A Theoretical Analysis

10.1007/978-3-642-33486-3_10

Ng AY, 2000, International Conference on Machine Learning, 1, 2

10.1145/2696454.2696455

10.15607/RSS.2019.XV.023

Park D, 2020, Proceedings of the Conference on Robot Learning (Proceedings of Machine Learning Research, 100, 1005

Ramachandran D, 2007, International Joint Conference on Artificial Intelligence, 7, 2586

10.15607/RSS.2017.XIII.053

10.15607/RSS.2016.XII.029

Schulman J, 2017, arXiv preprint arXiv:1707.06347

Shah A, 2020, arXiv preprint arXiv:2003.02232

10.1109/IROS.2012.6386109

10.1109/ICRA40945.2020.9196661

Viappiani P, 2010, Advances in Neural Information Processing Systems, 2352

10.1109/LRA.2019.2897342

Wise M, 2016, Workshop on Autonomous Mobile Service Robots

Ziebart BD, 2008, Proceedings of the AAAI, 8, 1433