Survey of Model-Based Reinforcement Learning: Applications on Robotics
Tóm tắt
Từ khóa
Tài liệu tham khảo
Deisenroth, M.P.: A survey on policy search for robotics. Foundations and Trends in Robotics 2(1–2), 1–142 (2011)
Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32(11), 1238–1274 (2013)
Kormushev, P., Calinon, S., Caldwell, D.G.: Reinforcement learning in robotics: applications and real-world challenges. Robotics 2(3), 122–148 (2013)
Levine, S., Koltun, V.: Learning complex neural network policies with trajectory optimization. In: Proceedings of the 31St International Conference on Machine Learning (ICML-14), pp. 829–837 (2014)
Deisenroth, M.P., Englert, P., Peters, J., Fox, D.: Multi-task policy search for robotics. In: IEEE International Conference on Robotics and Automation, IEEE, pp. 3876–3881 (2014)
van Rooijen, J., Grondman, I., Babuška, R.: Learning rate free reinforcement learning for real-time motion control using a value-gradient based policy. Mechatronics 24(8), 966–974 (2014)
Wilson, A., Fern, A., Tadepalli, P.: Using trajectory data to improve bayesian optimization for reinforcement learning. J. Mach. Learn. Res. 15(1), 253–282 (2014)
Kupcsik, A., Deisenroth, M.P., Peters, J., Loh, A.P., Vadakkepat, P., Neumann, G.: Model-based contextual policy search for data-efficient generalization of robot skills. Artif. Intell. (2014)
Strahl, J., Honkela, T., Wagner, P.: A gaussian process reinforcement learning algorithm with adaptability and minimal tuning requirements. In: Artificial Neural Networks and Machine Learning–ICANN 2014, pp. 371–378. Springer (2014)
Boedecker, J., Springenberg, J.T., Wulfing, J., Riedmiller, M.: Approximate real-time optimal control based on sparse gaussian process models. In: 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), IEEE, pp. 1–8 (2014)
Depraetere, B., Liu, M., Pinte, G., Grondman, I., Babuška, R. : Comparison of model-free and model-based methods for time optimal hit control of a badminton robot. Mechatronics 24(8), 1021–1030 (2014)
Guenter, F., Hersch, M., Calinon, S., Billard, A.: Reinforcement learning for imitating constrained reaching movements. Adv. Robot. 21(13), 1521–1544 (2007)
Shaker, M.R., Yue, S., Duckett, T.: Vision-based reinforcement learning using approximate policy iteration. In: International Conference on Advanced Robotics (2009)
Touzet, C.F.: Neural reinforcement learning for behaviour synthesis. Robot. Auton. Syst. 22(3-4), 251–281 (1997)
Boone, G.: Efficient reinforcement learning: model-based Acrobot control. In: Proceedings of International Conference on Robotics and Automation, p. 1 (1997)
Abbeel, P., Quigley, M., Ng, A.Y.: Using inaccurate models in reinforcement learning. In: Proceedings of the 23rd International Conference on Machine Learning - ICML ’06, pp. 1–8. ACM Press, New York, USA (2006)
Morimoto, J., Atkeson, C.G.: Minimax differential dynamic programming: an application to robust biped walking. Adv. Neural Inf. Proces. Syst. 15, 1539–1546 (2003)
Martínez-Marín, T., Duckett, T.: Fast reinforcement learning for vision-guided mobile robots. In: Proceedings - IEEE International Conference on Robotics and Automation, vol. 2005, pp. 4170–4175 (2005)
Martinez-Marin, T.: On-line optimal motion planning for nonholonomic mobile robots. In: Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006, pp. 512–517. IEEE (2006)
Bakker, B., Zhumatiy, V., Gruener, G., Schmidhuber, J.: Quasi-online reinforcement learning for robots. In: Proceedings - IEEE International Conference on Robotics and Automation, vol. 2006, pp. 2997–3002 (2006)
Leffler, B.R., Littman, M.L., Edmunds, T.: Efficient reinforcement learning with relocatable action models. In: Proceedings of the 22nd AAAI Conference on Artificial Intelligence, pp. 572–577 (2007)
Hester, T., Quinlan, M., Stone, P.: Generalized model learning for reinforcement learning on a humanoid robot. In: IEEE International Conference on Robotics and Automation (ICRA), 2010, pp. 2369–2374. IEEE (2010)
Nguyen, T., Li, Z., Silander, T., Leong, T.Y.: Online feature selection for model-based reinforcement learning. Proceedings of the 30th International Conference on Machine Learning (ICML-13), 498–506 (2013)
Van Den Berg, J., Miller, S., Duckworth, D., Hu, H., Wan, A., Fu, X.Y., Goldberg, K., Abbeel, P.: Superhuman performance of surgical tasks by robots using iterative learning from human-guided demonstrations. In: Proceedings - IEEE International Conference on Robotics and Automation, pp. 2074–2081 (2010)
Abbeel, P., Coates, A., Ng, A.Y.: Autonomous helicopter aerobatics through apprenticeship learning. Int. J. Robot. Res. 29(13), 1608–1639 (2010)
Ross, S., Bagnell, J.A.: Agnostic system identification for model-based reinforcement learning. In: Proceedings of the 29th International Conference on Machine Learning, pp. 1703–1710 (2012)
Coates, A., Abbeel, P., Ng, A.Y.: Apprenticeship learning for helicopter control. Commun. ACM 52(7), 97–105 (2009). doi: 10.1145/1538788.1538812
Schneider, J.G.: Exploiting Model Uncertainty Estimates for Safe Dynamic Control Learning. In: Neural Information Processing Systems 9, pp. 1047–1053. The MIT Press (1996)
Kuvayev, L., Sutton, R.: Model-based reinforcement learning with an approximate, learned model. In: Proceedings of the Ninth Yale Workshop on Adaptive and Learning Systems, pp. 101–105 (1996)
Hester, T., Quinlan, M., Stone, P.: RTMBA: a real-time model-based reinforcement learning architecture for robot control. In: IEEE International Conference on Robotics and Automation, pp. 85–89 (2012)
Frank, M., Leitner, J., Stollenga, M., Förster, A., Schmidhuber, J.: Curiosity driven reinforcement learning for motion planning on humanoids. Frontiers in neurorobotics 7, 25 (2014)
Atkeson, C.G.: Nonparametric model-based reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 1008–1014 (1998)
Yamaguchi, A., Atkeson, C.G.: Neural networks and differential dynamic programming for reinforcement learning problems. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 5434–5441. IEEE (2016)
Howard, R.: Dynamic Programming and Markov Processes. Technology Press of the Massachusetts Institute of Technology (1960)
Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)
Peters, J., Vijayakumar, S., Schaal, S.: Reinforcement learning for humanoid robotics. In: Proceedings of the Third IEEE-RAS International Conference on Humanoid Robots, pp. 1–20 (2003)
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. J. Mach. Learn. Res. 4, 1107–1149 (2003)
Lagoudakis, M., Parr, R., Littman, M.: Least-squares methods in reinforcement learning for control. In: Vlahavas, I., Spyropoulos, C. (eds.) Methods and Applications of Artificial Intelligence. Volume 2308 of Lecture Notes in Computer Science, pp. 249–260. Springer, Berlin, Heidelberg (2002)
Moore, A.W., Atkeson, C.G.: Prioritized sweeping: reinforcement learning with less data and less time. Mach. Learn. 13(1), 103–130 (1993)
Brafman, R.I., Tennenholtz, M.: R-max-a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3, 213–231 (2003)
Sherstov, A.A., Stone, P.: Improving Action Selection in Mdp’s via Knowledge Transfer. In: AAAI, vol. 5, pp. 1024–1029 (2005)
Lang, T., Toussaint, M., Kersting, K.: Exploration in relational domains for model-based reinforcement learning. J. Mach. Learn. Res. 13, 3725–3768 (2012)
Martínez, D., Alenya, G., Torras, C.: Relational reinforcement learning with guided demonstrations. Artif. Intell. (2015)
Martínez, D., Alenya, G., Torras, C.: Safe robot execution in model-based reinforcement learning. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015, pp. 6422–6427 (2015)
Yamaguchi, A., Atkeson, C.G.: Differential dynamic programming with temporally decomposed dynamics. In: IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), 2015, pp. 696–703 (2015)
Andersson, O., Heintz, F., Doherty, P.: Model-based reinforcement learning in continuous environments using real-time constrained optimization. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI15) (2015)
Anderson, B.D., Moore, J.B.: Optimal control: linear quadratic methods. Courier Corporation (2007)
Bertsekas, D.P., Bertsekas, D.P., Bertsekas, D.P., Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 1. Athena Scientific Belmont, MA (1995)
Bradtke, S.J.: Incremental dynamic programming for on-line adaptive optimal control. Phd thesis, Amherst, MA, USA. UMI Order No. GAX95-10446 (1995)
Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems. Technical Report 166 Cambridge University Engineering Department (1994)
Watkins, C., Dayan, P.: Technical note: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
Sutton, R.S.: Dyna, an integrated architecture for learning, planning, and reacting. ACM SIGART Bull. 2(4), 160–163 (1991)
Bagnell, J., Schneider, J.: Autonomous helicopter control using reinforcement learning policy search methods. In: IEEE International Conference on Robotics and Automation, vol. 2, pp. 1615–1620 (2001)
El-Fakdi, A., Carreras, M.: Policy gradient based reinforcement learning for real autonomous underwater cable tracking. In: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3635–3640. IEEE (2008)
El-Fakdi, A., Carreras, M.: Two-step gradient-based reinforcement learning for underwater robotics behavior learning. Robot. Auton. Syst. 61(3), 271–282 (2013)
Morimoto, J., Atkeson, C.G.: Nonparametric representation of an approximated poincaré map for learning biped locomotion. Auton. Robot. 27(2), 131–144 (2009)
Ng, A.Y., Kim, H.J., Jordan, M.I., Sastry, S.: Autonomous helicopter flight via reinforcement learning. Adv. Neural Inf. Proces. Syst. 16(16), 363–372 (2004)
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 1889–1897 (2015)
Deisenroth, M., Rasmussen, C., Fox, D.: Learning to control a low-cost manipulator using data-efficient reinforcement learning. RSS (2011)
Deisenroth, M.P., Calandra, R., Seyfarth, A., Peters, J.: Toward fast policy search for learning legged locomotion. In: IEEE International Conference on Intelligent Robots and Systems, pp. 1787–1792 (2012)
Koppejan, R., Whiteson, S.: Neuroevolutionary reinforcement learning for generalized helicopter control. In: Proceedings of the 11Th Annual Conference on Genetic and Evolutionary Computation - GECCO ’09, p. 145. ACM Press, New York, USA (2009)
Kupcsik, A., Deisenroth, M., Peters, J., Neumann, G.: Data-efficient generalization of robot skills with contextual policy search. In: Proceedings of the National Conference on Artificial Intelligence (AAAI) (2013)
Levine, S., Koltun, V.: Variational policy search via trajectory optimization. In: Advances in Neural Information Processing, pp. 207–215 (2013)
Deisenroth, M., Rasmussen, C.E.: PILCO: a model-based and data-efficient approach to policy search. In: 28th International Conference on Machine Learning, pp. 465–472 (2011)
Englert, P., Paraschos, A., Peters, J., Deisenroth, M.P.: Model-based imitation learning by probabilistic trajectory matching. In: IEEE International Conference on Robotics and Automation, pp. 1922–1927 (2013)
Mordatch, I., Mishra, N., Eppner, C., Abbeel, P.: Combining model-based policy search with online model learning for control of physical humanoids. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 242–248 (2016)
Tangkaratt, V., Mori, S., Zhao, T., Morimoto, J., Sugiyama, M.: Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation. Neural Netw. 57, 128–140 (2014)
Ko, J., Klein, D.J., Fox, D., Haehnel, D.: Gaussian processes and reinforcement learning for identification and control of an autonomous blimp. In: Proceedings 2007 IEEE International Conference on Robotics and Automation, pp. 742–747 (2007)
Michels, J., Saxena, A., Ng, A.Y.: High speed obstacle avoidance using monocular vision and reinforcement learning. In: Proceedings of the 22nd International Conference on Machine Learning, ACM, pp. 593–600 (2005)
Williams, G., Drews, P., Goldfain, B., Rehg, J.M., Theodorou, E.A.: Aggressive driving with model predictive path integral control. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 1433–1440 (2016)
Baxter, J., Bartlett, P.L.: Direct gradient-based reinforcement learning. In: The 2000 IEEE International Symposium on Circuits and Systems, 2000. Proceedings. ISCAS 2000 Geneva, vol. 3, pp. 271–274. IEEE (2000)
Girard, A., Rasmussen, C.E., Candela, J.Q., Murray-Smith, R.: Gaussian process priors with uncertain inputs application to multiple-step ahead time series forecasting. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems 15, pp. 545–552. MIT Press (2003)
Deisenroth, M.P.: Efficient Reinforcement Learning Using Gaussian Processes, vol. 9. KIT Scientific Publishing (2010)
Ng, A.Y., Jordan, M.: PEGASUS: a policy search method for large MDPs and POMDPs. In: Sixteenth Conference on Uncertainty in Artificial Intelligence, pp. 406–415. Morgan Kaufmann Publishers Inc (2000)
Peters, J., Mulling, K., Altun, Y.: Relative entropy policy search. In: Twenty-Fourth AAAI Conference on Artificial Intelligence (2010)
Theodorou, E., Buchli, J., Schaal, S.: A generalized path integral control approach to reinforcement learning. J. Mach. Learn. Res. 11(Nov), 3137–3181 (2010)
Pan, Y., Theodorou, E., Kontitsis, M.: Sample efficient path integral control under uncertainty. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28, pp. 2314–2322. Curran Associates, Inc (2015)
Colomé, A., Planells, A., Torras, C.: A friction-model-based framework for reinforcement learning of robotic tasks in non-rigid environments. In: 2015 IEEE International Conference on Robotics and Automation, (ICRA), pp. 5649–5654. IEEE (2015)
Theodorou, E., Buchli, J., Schaal, S.: Reinforcement learning of motor skills in high dimensions: a path integral approach. In: IEEE International Conference on Robotics and Automation (ICRA), 2010, IEEE, pp. 2397–2403 (2010)
Kober, J., Peters, J.R.: Policy search for motor primitives in robotics. In: Advances in Neural Information Processing Systems, pp. 849–856 (2009)
Polydoros, A.S., Nalpantidis, L.: A reservoir computing approach for learning forward dynamics of industrial manipulators. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016, IEEE, pp. 612–618 (2016)
Schaal, S., Atkeson, C.G.: Constructive incremental learning from only local information. Neural Comput. 10, 2047–2084 (1997)
Atkeson, C.G., Moore, A.W., Schaal, S.: Locally weighted learning for control. In: Lazy Learning, pp. 75–113. Springer (1997)
Quinlan, J.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Rasmussen, C.E.: Gaussian processes in machine learning. In: Advanced Lectures on Machine Learning, pp. 63–71. Springer (2004)
Albus, J.S.: A new approach to manipulator control: the cerebellar model articulation controller (CMAC). J. Dyn. Syst. Meas. Control. 97(3), 220–227 (1975)
Zufiria, P., Martínez-Marín, T.: Improved optimal control methods based upon the adjoining cell mapping technique. J. Optim. Theory Appl. 118(3), 657–680 (2003)
Andrew Moore, J.S.: Memory-based stochastic optimization. In: Touretzky, D., Mozer, M., Hasselm, M. (eds.) Neural Information Processing Systems 8, vol. 8, pp. 1066–1072. MIT Press (1996)
Sugiyama, M., Takeuchi, I., Suzuki, T., Kanamori, T., Hachiya, H., Okanohara, D.: Least-squares conditional density estimation. IEICE Trans. Inf. Syst. 93(3), 583–594 (2010)
Tangkaratt, V., Morimoto, J., Sugiyama, M.: Model-based reinforcement learning with dimension reduction. Neural Netw. 84, 1–16 (2016)