Học tăng cường trong robot: Một khảo sát
Tóm tắt
Học tăng cường cung cấp cho robot một khuôn khổ và bộ công cụ cho việc thiết kế những hành vi phức tạp và khó chế tạo. Ngược lại, những thách thức trong các vấn đề robot cung cấp cả nguồn cảm hứng, tác động và xác thực cho các phát triển trong học tăng cường. Mối quan hệ giữa các lĩnh vực này có đủ hứa hẹn để được so sánh với mối quan hệ giữa vật lý và toán học. Trong bài viết này, chúng tôi cố gắng củng cố các liên hệ giữa hai cộng đồng nghiên cứu bằng cách cung cấp một khảo sát về công trình nghiên cứu trong học tăng cường cho việc tạo ra hành vi ở robot. Chúng tôi nhấn mạnh cả những thách thức chính trong học tăng cường cho robot cũng như những thành công đáng chú ý. Chúng tôi thảo luận về cách các đóng góp đã kiểm soát độ phức tạp của lĩnh vực này và nghiên cứu vai trò của các thuật toán, các biểu diễn, và kiến thức trước đó trong việc đạt được những thành công này. Do đó, một trọng tâm cụ thể của bài báo của chúng tôi nằm ở sự lựa chọn giữa phương pháp dựa trên mô hình và không dựa trên mô hình, cũng như giữa phương pháp dựa trên giá trị và tìm kiếm chính sách. Bằng cách phân tích một vấn đề đơn giản trong một số chi tiết, chúng tôi chứng minh cách mà các phương pháp học tăng cường có thể được áp dụng một cách có lợi, và chúng tôi lưu ý rằng trong suốt bài viết có nhiều câu hỏi còn mở và tiềm năng to lớn cho nghiên cứu trong tương lai.
Từ khóa
Tài liệu tham khảo
Abbeel P, Coates A, Quigley M, Ng AY (2007) An application of reinforcement learning to aerobatic helicopter flight. In: Advances in Neural Information Processing Systems (NIPS).
An CH, 1988, Model-based control of a robot manipulator
Åström KJ, 1989, Adaptive Control
Atkeson CG, 1994, Advances in Neural Information Processing Systems (NIPS)
Atkeson CG, 1998, Advances in Neural Information Processing Systems (NIPS)
Atkeson CG, 1997, AI Review, 11, 75
Atkeson CG, 1997, International Conference on Machine Learning (ICML)
Bagnell JA, 2004, Learning Decisions: Robustness, Uncertainty, and Approximation
Bagnell JA, 2003, Advances in Neural Information Processing Systems (NIPS)
Bagnell JA, 2003, International Joint Conference on Artifical Intelligence (IJCAI)
Bakker B, 2003, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Bellman RE, 1957, Dynamic Programming
Bellman RE, 1967, Introduction to the Mathematical Theory of Control Processes
Bellman RE, 1971, Introduction to the Mathematical Theory of Control Processes
Bentivegna DC, 2004, Learning from Observation Using Primitives
Bertsekas DP, 1995, Dynamic Programming and Optimal Control
Betts JT, 2001, Practical methods for optimal control using nonlinear programming, 3
Birdwell N, 2007, Reinforcement learning in sensor-guided AIBO robots
Bishop C, 2006, Pattern Recognition and Machine Learning
Boyan JA, 1995, Advances in Neural Information Processing Systems (NIPS)
Brafman RI, 2002, Journal of Machine Learning Research, 3, 213
Deisenroth MP, 2011, 28th International Conference on Machine Learning (ICML)
Deisenroth MP, Rasmussen CE, Fox D (2011) Learning to control a low-cost manipulator using data-efficient reinforcement learning. In:s Robotics: Science and Systems (RSS).
Donoho DL, 2000, American Mathematical Society Conference Math Challenges of the 21st Century
Dorigo M, 1993, International Computer Science Institute
Duan Y, Cui B, Yang H (2008) Robot navigation based on fuzzy RL algorithm. In: International Symposium on Neural Networks (ISNN).
Fagg AH, Lotspeich DL, Hoff J, Bekey GA (1998) Rapid reinforcement learning for reactive control policy design for autonomous robots. In: Artificial Life in Robotics.
Fidelman P, Stone P (2004) Learning ball acquisition on a physical robot. In: International Symposium on Robotics and Automation (ISRA).
Geng T, 2006, Advances in Neural Information Processing Systems (NIPS)
Goldberg DE, 1989, Genetic algorithms
Gordon GJ, 1999, School of Computer Science, Carnegie Mellon University
Gräve K, 2010, Joint International Symposium on Robotics (ISR) and German Conference on Robotics (ROBOTIK)
Greensmith E, 2004, Journal of Machine Learning Research, 5, 1471
Hafner R, 2003, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Huang X, Weng J (2002) Novelty and reinforcement learning in the value system of developmental robots. In: 2nd International Workshop on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems.
Ijspeert AJ, 2003, Advances in Neural Information Processing Systems (NIPS)
Jaakkola T, Jordan MI, Singh SP (1993) Convergence of stochastic iterative dynamic programming algorithms. In: Advances in Neural Information Processing Systems (NIPS).
Jacobson DH, 1970, Differential Dynamic Programming
Kaelbling LP, 1990, Stanford University, Stanford
Kakade S, 2003, On the Sample Complexity of Reinforcement Learning
Kakade S, 2002, International Conference on Machine Learning (ICML)
Katz D, Pyuro Y, Brock O (2008) Learning to manipulate articulated objects in unstructured environments using a grounded relational representation. In: Robotics: Science and Systems (RSS).
Kawato M, 1990, Advanced Neural Computers, 6, 365
Keeney R, 1976, Decisions with multiple objectives: Preferences and value tradeoffs
Kirchner F (1997) Q-learning of complex behaviours on a six-legged walking machine. In: EUROMICRO Workshop on Advanced Mobile Robots.
Kirk DE, 1970, Optimal control theory
Kober J, Oztop E, Peters J (2010) Reinforcement learning to adjust robot movements to new situations. In: Robotics: Science and Systems (RSS).
Kober J, Peters J (2009) Policy search for motor primitives in robotics. In: Advances in Neural Information Processing Systems (NIPS).
Kober J, 2010, Machine Learning, 84, 171
Kolter JZ, Abbeel P, Ng AY (2007) Hierarchical apprenticeship learning with application to quadruped locomotion. In: Advances in Neural Information Processing Systems (NIPS).
Kolter JZ, Ng AY (2009a) Policy search via the signed derivative. In: Robotics: Science and Systems (RSS).
Konidaris GD, 2011, AAAI Conference on Artificial Intelligence (AAAI)
Konidaris GD, 2011, AAAI Conference on Artificial Intelligence (AAAI)
Kuhn HW, Tucker AW (1950) Nonlinear programming. In: Berkeley Symposium on Mathematical Statistics and Probability.
Laud AD, 2004, University of Illinois at Urbana-Champaign
Lewis ME, 2001, The Handbook of Markov Decision Processes: Methods and Applications, 89
Lizotte D, 2007, International Joint Conference on Artifical Intelligence (IJCAI)
Moldovan TM, 2012, 29th International Conference on Machine Learning (ICML), 1711
Muelling K, 2012, The International Journal of Robotics Research
Nemec B, Zorko M, Zlajpah L (2010) Learning of a ball in a cup playing robot. In: International Workshop on Robotics in Alpe-Adria-Danube Region (RAAD).
Ng AY, Coates A, Diel M, (2004a) Autonomous inverted helicopter flight via reinforcement learning. In: International Symposium on Experimental Robotics (ISER).
Ng AY, 1999, International Conference on Machine Learning (ICML)
Ng AY, Kim HJ, Jordan MI, Sastry S (2004b) Autonomous helicopter flight via reinforcement learning. In: Advances in Neural Information Processing Systems (NIPS).
Park DH, 2008, IEEE International Conference on Humanoid Robots (HUMANOIDS)
Pendrith M (1999) Reinforcement learning in situated agents: Some theoretical problems and practical solutions. In: European Workshop on Learning Robots (EWRL).
Perkins TJ, 2002, Journal of Machine Learning Research, 3, 803
Peters J, 2010, National Conference on Artificial Intelligence (AAAI)
Peters J, Muelling K, Kober J, Nguyen-Tuong D, Kroemer O (2010b) Towards motor skill learning for robotics. In: International Symposium on Robotics Research (ISRR).
Peters J, 2004, University of Southern California
Platt R, 2006, International Conference on Development and Learning
Powell WB, 2012, Princeton University
Randløv J, 1998, International Conference on Machine Learning (ICML), 463
Rasmussen C, 2006, Gaussian Processes for Machine Learning
Ratliff N, Bradley D, Bagnell JA, Chestnutt J (2006a) Boosting structured prediction for imitation learning. In: Advances in Neural Information Processing Systems (NIPS).
Rivlin TJ, 1969, An Introduction to the Approximation of Functions
Roberts JW, Manchester I, Tedrake R (2011) Feedback controller parameterizations for reinforcement learning. In: IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
Ross S, 2012, International Conference on Machine Learning (ICML)
Ross S, 2011, International Conference on Artifical Intelligence and Statistics (AISTATS)
Rückstieß T, 2008, European Conference on Machine Learning (ECML)
Schaal S (1996) Learning from demonstration. In: Advances in Neural Information Processing Systems (NIPS).
Schaal S, 2009, University of Southern California
Schneider JG (1996) Exploiting model uncertainty estimates for safe dynamic control learning. In: Advances in Neural Information Processing Systems (NIPS).
Silver D, Bagnell JA, Stentz A (2008) High performance outdoor navigation from overhead data using imitation learning. In: Robotics: Science and Systems (RSS).
Smart WD, 1998, National Conference on Artificial Intelligence/Innovative Applications of Artificial Intelligence (AAAI/IAAI)
Soni V, 2006, International Conference on Development and Learning (ICDL)
Sorg J, 2010, Advances in Neural Information Processing Systems (NIPS)
Strens M, 2001, International Conference on Machine Learning (ICML)
Stulp F, 2012, International Conference on Machine Learning (ICML)
Sutton RS, 1998, Reinforcement Learning
Sutton RS, 1999, Advances in Neural Information Processing Systems (NIPS)
Tadepalli P, 1994, H-learning: A Reinforcement Learning Method to Optimize Undiscounted Average Reward
Tedrake R, 2005, Yale Workshop on Adaptive and Learning Systems
Tokic M, 2009, International Florida Artificial Intelligence Research Society Conference (FLAIRS)
Toussaint M, 2010, Inference and Learning in Dynamic Models
Wikipedia (2013) Fosbury Flop. http://en.wikipedia.org/wiki/Fosbury_FlopFosbury_Flop.
Willgoss RA, 1999, Australian Conference on Robotics and Automation
Youssef SM, 2005, ICGST International Conference on Automation, Robotics and Autonomous Systems (ARAS)
Zhou K, 1997, Essentials of Robust Control
Ziebart BD, 2008, AAAI Conference on Artificial Intelligence (AAAI)