Reinforcement learning for robot soccer

Martin Riedmiller1, Thomas Gabel1, Roland Hafner1, Sascha Lange1
1Department of Computer Science, Albert-Ludwigs-Universität Freiburg, Freiburg, Germany

Tóm tắt

Từ khóa


Tài liệu tham khảo

Asada, M., Uchibe, E., & Hosoda, K. (1999). Cooperative behavior acquisition for mobile robots in dynamically changing real worlds via vision-based reinforcement learning and development. Artificial Intelligence, 110(2), 275–292.

Bagnell, J., & Schneider, J. (2001). Autonomous helicopter control using reinforcement learning policy search methods. In Proceedings of the 2001 IEEE international conference on robotics and automation (ICRA 2001) (pp. 1615–1620), Seoul, South Korea. New York: IEEE Press.

Behnke, S., Egorova, A., Gloye, A., Rojas, R., & Simon, M. (2003). Predicting away robot control latency. In D. Polani, B. Browning, A. Bonarini, & K. Yoshida (Eds.), LNCS. RoboCup 2003: robot soccer world cup VII (pp. 712–719), Padua, Italy. Berlin: Springer.

Bellman, R. (1957). Dynamic programming. Princeton: Princeton University Press.

Bertsekas, D., & Tsitsiklis, J. (1996). Neuro dynamic programming. Belmont: Athena Scientific.

Chernova, S., & Veloso, M. (2004). An evolutionary approach to gait learning for four-legged robots. In Proceedings of the 2004 IEEE/RSJ international conference on intelligent robots and systems (IROS 2004), Sendai, Japan. New York: IEEE Press.

Crites, R., & Barto, A. (1995). Improving elevator performance using reinforcement learning. In Advances in neural information processing systems 8 (NIPS 1995) (pp. 1017–1023), Denver, USA. Cambridge: MIT Press.

Ernst, D., Geurts, P., & Wehenkel, L. (2006). Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6(1), 503–556.

Gabel, T., & Riedmiller, M. (2007). Adaptive reactive job-shop scheduling with learning agents. International Journal of Information Technology and Intelligent Computing, 2(4).

Gabel, T., Hafner, R., Lange, S., Lauer, M., & Riedmiller, M. (2006). Bridging the gap: learning in the RoboCup simulation and midsize league. In Proceedings of the 7th Portuguese conference on automatic control (Controlo 2006), Porto, Portugal.

Gabel, T., Riedmiller, M., & Trost, F. (2008). A case study on improving defense behavior in soccer simulation 2D: the NeuroHassle approach. In Iocchi, L., Matsubara, H., Weitzenfeld, A., & Zhou, C. (Eds.), LNCS. RoboCup 2008: robot soccer world cup XII, Suzhou, China. Berlin: Springer.

Gordon, G., Prieditis, A., & Russell, S. (1995). Stable function approximation in dynamic programming. In Proceedings of the twelfth international conference on machine learning (ICML 1995) (pp. 261–268), Tahoe City, USA. San Mateo: Morgan Kaufmann.

Hafner, R., & Riedmiller, M. (2007). Neural reinforcement learning controllers for a real robot application. In Proceedings of the IEEE international conference on robotics and automation (ICRA 07), Rome, Italy. New York: IEEE Press.

Kaufmann, U., Mayer, G., Kraetzschmar, G., & Palm, G. (2004). Visual robot detection in RoboCup using neural networks. In D. Nardi, M. Riedmiller, C. Sammut, & J. Santos-Victor (Eds.), LNCS. RoboCup 2004: robot soccer world cup VIII (pp. 310–322), Porto, Portugal. Berlin: Springer.

Kitano, H. (Ed.). (1997). RoboCup-97: robot soccer world cup I. Berlin: Springer.

Kober, J., Mohler, B., & Peters, J. (2008). Learning perceptual coupling for motor primitives. In Proceedings of the 2008 IEEE/RSJ international conference on intelligent robots and systems (IROS 2008) (pp. 834–839), Nice, France. New York: IEEE Press.

Lagoudakis, M., & Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107–1149.

Lauer, M., Lange, S., & Riedmiller, M. (2005). Calculating the perfect match: an efficient and accurate approach for robot self-localization. In A. Bredenfeld, A. Jacoff, I. Noda, & Y. Takahashi (Eds.), LNCS. RoboCup 2005: robot soccer world cup IX (pp. 142–153), Osaka, Japan. Berlin: Springer.

Lauer, M., Lange, S., & Riedmiller, M. (2006). Motion estimation of moving objects for autonomous mobile robots. Kunstliche Intelligenz, 20(1), 11–17.

Li, B., Hu, H., & Spacek, L. (2003). An adaptive color segmentation algorithm for Sony legged robots. In The 21st IASTED international multi-conference on applied informatics (AI 2003) (pp. 126–131), Innsbruck, Austria. New York: IASTED/ACTA Press.

Lin, L. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8(3), 293–321.

Ma, J., & Cameron, S. (2008). Combining policy search with planning in multi-agent cooperation. In L. Iocchi, H. Matsubara, A. Weitzenfeld, & C. Zhou (Eds.), LNAI. RoboCup 2008: robot soccer world cup XII, Suzhou, China. Berlin: Springer.

Nakashima, T., Takatani, M., Udo, M., Ishibuchi, H., & Nii, M. (2005). Performance evaluation of an evolutionary method for RoboCup soccer strategies. In A. Bredenfeld, A. Jacoff, I. Noda, & Y. Takahashi (Eds.), LNAI. RoboCup 2005: robot soccer world cup IX, Osaka, Japan. Berlin: Springer.

Ng, A., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E., & Liang, E. (2004). Autonomous inverted helicopter flight via reinforcement learning. In Experimental robotics IX, the 9th international symposium on experimental robotics (ISER) (pp. 363–372), Singapore, China. Berlin: Springer.

Noda, I., Matsubara, H., Hiraki, K., & Frank, I. (1998). Soccer server: a tool for research on multi-agent systems. Applied Artificial Intelligence, 12(2–3), 233–250.

Ogino, M., Katoh, Y., Aono, M., Asada, M., & Hosoda, K. (2004). Reinforcement learning of humanoid rhythmic walking parameters based on visual information. Advanced Robotics, 18(7), 677–697.

Oubbati, M., Schanz, M., & Levi, P. (2005). Kinematic and dynamic adaptive control of a nonholonomic mobile robot using a RNN. In Proceedings of the 20005 IEEE international symposium on computational intelligence in robotics and automation (CIRA 2005) (pp. 27–33). New York: IEEE Press.

Peters, J., & Schaal, S. (2006). Policy gradient methods for robotics. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS), Beijing, China. New York: IEEE Press.

Peters, J., & Schaal, S. (2008a). Learning to control in operational space. The International Journal of Robotics Research, 27(2), 197–212.

Peters, J., & Schaal, S. (2008b). Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4), 682–697.

Puterman, M. (2005). Markov decision processes: discrete stochastic dynamic programming. New York: Wiley-Interscience.

Riedmiller, M. (1997). Generating continuous control signals for reinforcement controllers using dynamic output elements. In Proceedings of the European symposium on artificial neural networks (ESANN 1997), Bruges, Belgium.

Riedmiller, M. (2005). Neural fitted Q iteration—first experiences with a data efficient neural reinforcement learning method. In Machine learning: ECML 2005, 16th European conference on machine learning, Porto, Portugal. Berlin: Springer.

Riedmiller, M., & Braun, H., (1993). A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In H. Ruspini (Ed.), Proceedings of the IEEE international conference on neural networks (ICNN) (pp. 586–591), San Francisco.

Riedmiller, M., & Merke, A. (2003). Using machine learning techniques in complex multi-agent domains. In I. Stamatescu, W. Menzel, M. Richter, & U. Ratsch (Eds.), Adaptivity and learning. Berlin: Springer.

Riedmiller, M., Montemerlo, M., & Dahlkamp, H. (2007). Learning to drive in 20 minutes. In Proceedings of the FBIT 2007 conference, Jeju, Korea. Berlin: Springer.

Röfer, T. (2004). Evolutionary gait-optimization using a fitness function based on proprioception. In Nardi, D., Riedmiller, M., Sammut, C., & Santos-Victor, J. (Eds.), LNCS. RoboCup 2004: robot soccer world cup VIII (pp. 310–322), Porto, Portugal. Berlin: Springer.

Stone, P., Sutton, R., & Kuhlmann, G. (2005). Reinforcement learning for RoboCup-soccer keepaway. Adaptive Behavior, 13(3), 165–188.

Sutton, R., & Barto, A. (1998). Reinforcement learning. An introduction. Cambridge: MIT Press/A Bradford Book.

Sutton, R., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems 12 (NIPS 1999) (pp. 1057–1063), Denver, USA. Cambridge: MIT Press.

Tesauro, G., & Galpering, G. (1995). On-line policy improvement using Monte Carlo search. In Neural information processing systems (NIPS 1996) (pp. 206–221), Denver, USA. Berlin: Springer.

Tesauro, G., & Sejnowski, T. (1989). A parallel network that learns to play backgammon. Artificial Intelligence, 39(3), 357–390.

Treptow, A., & Zell, A. (2004). Real-time object tracking for soccer-robots without color information. Robotics and Autonomous Systems, 48(1), 41–48.

Watkins, C., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279–292.

Wehenkel, L., Glavic, M., & Ernst, D. (2005). New developments in the application of automatic learning to power system control. In Proceedings of the 15th power systems computation conference (PSCC05), Liege, Belgium.