Học tăng cường trong robot: Một khảo sát

International Journal of Robotics Research - Tập 32 Số 11 - Trang 1238-1274 - 2013
Jens Kober1,2, J. Andrew Bagnell3, Jan Peters4,5
1Bielefeld University, CoR-Lab Research Institute for Cognition and Robotics, Bielefeld, Germany
2Honda Research Institute Europe, Offenbach/Main, Germany
3Carnegie Mellon University, Robotics Institute, Pittsburgh, PA, USA
4Max Planck Institute for Intelligent Systems, Department of Empirical Inference, Tübingen, Germany
5Technische Universität Darmstadt, FB Informatik, FG Intelligent Autonomous Systems, Darmstadt, Germany

Tóm tắt

Học tăng cường cung cấp cho robot một khuôn khổ và bộ công cụ cho việc thiết kế những hành vi phức tạp và khó chế tạo. Ngược lại, những thách thức trong các vấn đề robot cung cấp cả nguồn cảm hứng, tác động và xác thực cho các phát triển trong học tăng cường. Mối quan hệ giữa các lĩnh vực này có đủ hứa hẹn để được so sánh với mối quan hệ giữa vật lý và toán học. Trong bài viết này, chúng tôi cố gắng củng cố các liên hệ giữa hai cộng đồng nghiên cứu bằng cách cung cấp một khảo sát về công trình nghiên cứu trong học tăng cường cho việc tạo ra hành vi ở robot. Chúng tôi nhấn mạnh cả những thách thức chính trong học tăng cường cho robot cũng như những thành công đáng chú ý. Chúng tôi thảo luận về cách các đóng góp đã kiểm soát độ phức tạp của lĩnh vực này và nghiên cứu vai trò của các thuật toán, các biểu diễn, và kiến thức trước đó trong việc đạt được những thành công này. Do đó, một trọng tâm cụ thể của bài báo của chúng tôi nằm ở sự lựa chọn giữa phương pháp dựa trên mô hình và không dựa trên mô hình, cũng như giữa phương pháp dựa trên giá trị và tìm kiếm chính sách. Bằng cách phân tích một vấn đề đơn giản trong một số chi tiết, chúng tôi chứng minh cách mà các phương pháp học tăng cường có thể được áp dụng một cách có lợi, và chúng tôi lưu ý rằng trong suốt bài viết có nhiều câu hỏi còn mở và tiềm năng to lớn cho nghiên cứu trong tương lai.

Từ khóa


Tài liệu tham khảo

Abbeel P, Coates A, Quigley M, Ng AY (2007) An application of reinforcement learning to aerobatic helicopter flight. In: Advances in Neural Information Processing Systems (NIPS).

10.1145/1015330.1015430

10.1145/1143844.1143845

An CH, 1988, Model-based control of a robot manipulator

10.1109/IROS.2008.4651020

10.1016/j.robot.2008.10.024

10.1002/rob.4620010203

10.1007/BF00117447

Åström KJ, 1989, Adaptive Control

Atkeson CG, 1994, Advances in Neural Information Processing Systems (NIPS)

Atkeson CG, 1998, Advances in Neural Information Processing Systems (NIPS)

Atkeson CG, 1997, AI Review, 11, 75

Atkeson CG, 1997, International Conference on Machine Learning (ICML)

Bagnell JA, 2004, Learning Decisions: Robustness, Uncertainty, and Approximation

Bagnell JA, 2003, Advances in Neural Information Processing Systems (NIPS)

Bagnell JA, 2003, International Joint Conference on Artifical Intelligence (IJCAI)

10.1109/ROBOT.2001.932842

10.21236/ADA280844

Bakker B, 2003, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

10.1109/ROBOT.2006.1642157

10.1023/A:1025696116075

Bellman RE, 1957, Dynamic Programming

Bellman RE, 1967, Introduction to the Mathematical Theory of Control Processes

Bellman RE, 1971, Introduction to the Mathematical Theory of Control Processes

10.1109/IJCNN.1992.287219

10.1016/S0921-8890(97)00043-2

Bentivegna DC, 2004, Learning from Observation Using Primitives

Bentivegna DC, 2004, Robotics Research, 15, 551, 10.1007/11008941_59

Bertsekas DP, 1995, Dynamic Programming and Optimal Control

Betts JT, 2001, Practical methods for optimal control using nonlinear programming, 3

Birdwell N, 2007, Reinforcement learning in sensor-guided AIBO robots

Bishop C, 2006, Pattern Recognition and Machine Learning

10.1109/IROS.2010.5650243

Boyan JA, 1995, Advances in Neural Information Processing Systems (NIPS)

Brafman RI, 2002, Journal of Machine Learning Research, 3, 213

10.1177/0278364911402527

10.1109/TCST.2005.847335

10.1201/9781439821091

10.1145/1538788.1538812

10.1109/IROS.2006.282061

10.1109/CIRA.2007.382878

10.1109/IROS.2012.6386047

10.1007/s10994-009-5106-x

10.1162/neco.1997.9.2.271

Deisenroth MP, 2011, 28th International Conference on Machine Learning (ICML)

Deisenroth MP, Rasmussen CE, Fox D (2011) Learning to control a low-cost manipulator using data-efficient reinforcement learning. In:s Robotics: Science and Systems (RSS).

10.1109/3477.499790

Donoho DL, 2000, American Mathematical Society Conference Math Challenges of the 21st Century

Dorigo M, 1993, International Computer Science Institute

Duan Y, Cui B, Yang H (2008) Robot navigation based on fuzzy RL algorithm. In: International Symposium on Neural Networks (ISNN).

10.1016/j.engappai.2007.01.003

10.1177/0278364907084980

10.1016/j.robot.2007.08.001

Fagg AH, Lotspeich DL, Hoff J, Bekey GA (1998) Rapid reinforcement learning for reactive control policy design for autonomous robots. In: Artificial Life in Robotics.

Fidelman P, Stone P (2004) Learning ball acquisition on a physical robot. In: International Symposium on Robotics and Automation (ISRA).

10.1177/0142331209104155

10.1109/IROS.2000.894638

Geng T, 2006, Advances in Neural Information Processing Systems (NIPS)

10.21236/ADA197085

Goldberg DE, 1989, Genetic algorithms

Gordon GJ, 1999, School of Computer Science, Carnegie Mellon University

Gräve K, 2010, Joint International Symposium on Robotics (ISR) and German Conference on Robotics (ROBOTIK)

Greensmith E, 2004, Journal of Machine Learning Research, 5, 1471

10.1163/156855307782148550

10.1109/37.257890

Hafner R, 2003, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

10.1109/ROBOT.2007.363631

10.1109/ICSMC.1998.728096

10.1109/TAMD.2010.2103311

10.1109/ROBOT.2010.5509181

10.1109/ICRA.2012.6225072

Huang X, Weng J (2002) Novelty and reinforcement learning in the value system of developmental robots. In: 2nd International Workshop on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems.

10.1016/S0921-8890(97)00044-4

Ijspeert AJ, 2003, Advances in Neural Information Processing Systems (NIPS)

10.1109/ROBOT.1999.770457

Jaakkola T, Jordan MI, Singh SP (1993) Convergence of stochastic iterative dynamic programming algorithms. In: Advances in Neural Information Processing Systems (NIPS).

Jacobson DH, 1970, Differential Dynamic Programming

10.1007/3-540-59496-5_337

Kaelbling LP, 1990, Stanford University, Stanford

10.1613/jair.301

Kakade S, 2003, On the Sample Complexity of Reinforcement Learning

Kakade S, 2002, International Conference on Machine Learning (ICML)

10.1109/IROS.2011.6095096

10.1115/1.3653115

10.1007/3-540-49240-2_3

10.1088/1742-5468/2005/11/P11011

Katz D, Pyuro Y, Brock O (2008) Learning to manipulate articulated objects in unstructured environments using a grounded relational representation. In: Robotics: Science and Systems (RSS).

Kawato M, 1990, Advanced Neural Computers, 6, 365

10.1023/A:1017984413808

Keeney R, 1976, Decisions with multiple objectives: Preferences and value tradeoffs

10.1109/CDC.2001.980135

Kirchner F (1997) Q-learning of complex behaviours on a six-legged walking machine. In: EUROMICRO Workshop on Advanced Mobile Robots.

Kirk DE, 1970, Optimal control theory

10.1109/ROBOT.2007.363075

10.1109/IROS.2008.4650953

Kober J, Oztop E, Peters J (2010) Reinforcement learning to adjust robot movements to new situations. In: Robotics: Science and Systems (RSS).

Kober J, Peters J (2009) Policy search for motor primitives in robotics. In: Advances in Neural Information Processing Systems (NIPS).

Kober J, 2010, Machine Learning, 84, 171

10.1109/ROBOT.2004.1307456

10.1177/0278364907087426

Kolter JZ, Abbeel P, Ng AY (2007) Hierarchical apprenticeship learning with application to quadruped locomotion. In: Advances in Neural Information Processing Systems (NIPS).

10.1145/1390156.1390218

Kolter JZ, Ng AY (2009a) Policy search via the signed derivative. In: Robotics: Science and Systems (RSS).

10.1145/1553374.1553442

10.1109/ROBOT.2010.5509562

Konidaris GD, 2011, AAAI Conference on Artificial Intelligence (AAAI)

10.1177/0278364911428653

Konidaris GD, 2011, AAAI Conference on Artificial Intelligence (AAAI)

10.1109/IROS.2009.5354345

10.1016/j.robot.2010.06.001

Kuhn HW, Tucker AW (1950) Nonlinear programming. In: Berkeley Symposium on Mathematical Statistics and Probability.

10.1109/Humanoids.2011.6100881

10.1109/IROS.2004.1389903

10.1145/1102351.1102411

10.1007/978-3-540-74024-7_5

Laud AD, 2004, University of Illinois at Urbana-Champaign

Lewis ME, 2001, The Handbook of Markov Decision Processes: Methods and Applications, 89

10.1109/IROS.2012.6385878

Lizotte D, 2007, International Joint Conference on Artifical Intelligence (IJCAI)

10.1016/0004-3702(92)90058-6

10.1109/ROBOT.2005.1570760

10.1016/B978-1-55860-335-6.50030-1

10.1023/A:1008819414322

10.1145/1102351.1102426

10.1109/IROS.2005.1545206

10.1016/S0893-6080(96)00043-3

Moldovan TM, 2012, 29th International Conference on Machine Learning (ICML), 1711

10.1007/BF00993104

10.1016/S0921-8890(01)00113-0

Muelling K, 2012, The International Journal of Robotics Research

10.1177/0278364908091463

10.1109/ICHR.2009.5379568

Nemec B, Zorko M, Zlajpah L (2010) Learning of a ball in a cup playing robot. In: International Workshop on Robotics in Alpe-Adria-Danube Region (RAAD).

Ng AY, Coates A, Diel M, (2004a) Autonomous inverted helicopter flight via reinforcement learning. In: International Symposium on Experimental Robotics (ISER).

Ng AY, 1999, International Conference on Machine Learning (ICML)

Ng AY, Kim HJ, Jordan MI, Sastry S (2004b) Autonomous helicopter flight via reinforcement learning. In: Advances in Neural Information Processing Systems (NIPS).

10.1109/TRA.2002.999653

10.1109/ROBOT.2010.5509420

10.1007/978-3-540-74565-5_19

Park DH, 2008, IEEE International Conference on Humanoid Robots (HUMANOIDS)

10.1109/ICRA.2011.5980200

Pendrith M (1999) Reinforcement learning in situated agents: Some theoretical problems and practical solutions. In: European Workshop on Learning Robots (EWRL).

10.1007/BF00114731

Perkins TJ, 2002, Journal of Machine Learning Research, 3, 803

Peters J, 2010, National Conference on Artificial Intelligence (AAAI)

Peters J, Muelling K, Kober J, Nguyen-Tuong D, Kroemer O (2010b) Towards motor skill learning for robotics. In: International Symposium on Robotics Research (ISRR).

10.1177/0278364907087548

10.1016/j.neucom.2007.11.026

10.1016/j.neunet.2008.02.003

Peters J, 2004, University of Southern California

10.1177/0278364910382464

Platt R, 2006, International Conference on Development and Learning

Powell WB, 2012, Princeton University

10.1002/9780470316887

Randløv J, 1998, International Conference on Machine Learning (ICML), 463

Rasmussen C, 2006, Gaussian Processes for Machine Learning

10.21236/ADA528601

Ratliff N, Bradley D, Bagnell JA, Chestnutt J (2006a) Boosting structured prediction for imitation learning. In: Advances in Neural Information Processing Systems (NIPS).

10.1145/1143844.1143936

10.1007/s10514-009-9120-4

Rivlin TJ, 1969, An Introduction to the Approximation of Functions

Roberts JW, Manchester I, Tedrake R (2011) Feedback controller parameterizations for reinforcement learning. In: IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

10.1007/978-3-642-05181-4_13

10.23919/ACC.2004.1384022

Ross S, 2012, International Conference on Machine Learning (ICML)

Ross S, 2011, International Conference on Artifical Intelligence and Statistics (AISTATS)

10.1109/CVPR.2011.5995724

10.1109/IROS.2007.4399531

10.1007/978-1-4757-4321-0

Rückstieß T, 2008, European Conference on Machine Learning (ECML)

10.1145/279943.279964

10.2307/2171751

10.1007/3-540-46084-5_126

Schaal S (1996) Learning from demonstration. In: Advances in Neural Information Processing Systems (NIPS).

10.1016/S1364-6613(99)01327-3

Schaal S, 2009, University of Southern California

10.1109/37.257895

10.1023/A:1015727715131

10.1016/S0079-6123(06)65027-9

Schneider JG (1996) Exploiting model uncertainty estimates for safe dynamic control learning. In: Advances in Neural Information Processing Systems (NIPS).

10.1016/B978-1-55860-307-3.50045-9

Silver D, Bagnell JA, Stentz A (2008) High performance outdoor navigation from overhead data using imitation learning. In: Robotics: Science and Systems (RSS).

10.1177/0278364910369715

Smart WD, 1998, National Conference on Artificial Intelligence/Innovative Applications of Artificial Intelligence (AAAI/IAAI)

10.1109/ROBOT.2002.1014237

Soni V, 2006, International Conference on Development and Learning (ICDL)

Sorg J, 2010, Advances in Neural Information Processing Systems (NIPS)

10.1002/0471722138

Strens M, 2001, International Conference on Machine Learning (ICML)

Stulp F, 2012, International Conference on Machine Learning (ICML)

10.1109/IROS.2011.6094877

10.1016/B978-1-55860-141-3.50030-4

Sutton RS, 1998, Reinforcement Learning

10.23919/ACC.1991.4791776

10.1145/1273496.1273606

Sutton RS, 1999, Advances in Neural Information Processing Systems (NIPS)

10.1016/S0954-1810(01)00027-9

Tadepalli P, 1994, H-learning: A Reinforcement Learning Method to Optimize Undiscounted Average Reward

10.1007/978-3-642-03040-6_125

10.1016/j.robot.2011.07.004

10.1109/IROS.2004.1389841

10.1177/0278364910369189

Tedrake R, 2005, Yale Workshop on Adaptive and Learning Systems

10.1109/IROS.2011.6095076

10.1109/ROBOT.2010.5509336

10.1016/0921-8890(95)00022-8

Tokic M, 2009, International Florida Artificial Intelligence Research Society Conference (FLAIRS)

Toussaint M, 2010, Inference and Learning in Dynamic Models

10.1016/S0921-8890(97)00042-0

10.1109/9.580874

10.1109/ROBOT.1998.677351

10.1109/ROBOT.2010.5509621

10.1007/s10514-009-9132-0

10.1109/RAMECH.2006.252749

10.1109/ICHR.2010.5686339

Wikipedia (2013) Fosbury Flop. http://en.wikipedia.org/wiki/Fosbury_FlopFosbury_Flop.

Willgoss RA, 1999, Australian Conference on Robotics and Automation

10.1007/BF00992696

10.1109/ROBOT.1997.620036

10.1007/978-3-540-69134-1_25

Youssef SM, 2005, ICGST International Conference on Automation, Robotics and Autonomous Systems (ARAS)

Zhou K, 1997, Essentials of Robust Control

Ziebart BD, 2008, AAAI Conference on Artificial Intelligence (AAAI)

10.1109/ICRA.2012.6225036