An Empirical Study on Google Research Football Multi-agent Scenarios

Yan Song1, He Jiang2, Zheng Tian3, Haifeng Zhang1, Yingping Zhang4, Jiangcheng Zhu4, Zonghong Dai4, Weinan Zhang5, Jun Wang6
1Institute of Automation, Chinese Academy of Sciences, Beijing, China
2Digital Brain Lab, Shanghai, China
3ShanghaiTech University, Shanghai, China
4Huawei Cloud, Guiyang, China
5Shanghai Jiao Tong University, Shanghai, China
6University College London, London, UK

Tóm tắt

Few multi-agent reinforcement learning (MARL) researches on Google research football (GRF) focus on the 11-vs-11 multi-agent full-game scenario and to the best of our knowledge, no open benchmark on this scenario has been released to the public. In this work, we fill the gap by providing a population-based MARL training pipeline and hyperparameter settings on multi-agent football scenario that outperforms the bot with difficulty 1.0 from scratch within 2 million steps. Our experiments serve as a reference for the expected performance of independent proximal policy optimization (IPPO), a state-of-the-art multi-agent reinforcement learning algorithm where each agent tries to maximize its own policy independently across various training configurations. Meanwhile, we release our training framework Light-MALib which extends the MALib codebase by distributed and asynchronous implementation with additional analytical tools for football games. Finally, we provide guidance for building strong football AI with population-based training and release diverse pretrained policies for benchmarking. The goal is to provide the community with a head start for whoever experiment their works on GRF and a simple-to-use population-based training framework for further improving their agents through self-play. The implementation is available at https://github.com/Shanghai-Digital-Brain-Laboratory/DB-Football .

Tài liệu tham khảo

K. Kurach, A. Raichuk, P. Stańczyk, M. Zając, O. Bachem, L. Espeholt, C. Riquehne, D. Vincent, M. Michalski, O. Bousquet, S. Gelly. Google research football: A novel reinforcement learning environment. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, USA, pp.4501-4510, 2020. DOI: https://doi.org/10.1609/aaai.v34i04.5878. C. S. de Witt, T. Gupta, D. Makoviichuk, V. Makoviychuk, P. H. S. Torr, M. F. Sun, S. Whiteson. Is independent learning all you need in the StarCraft multi-agent challenge? [Online], Available: https://arxiv.org/abs/2011.09533, 2020. M. Zhou, Z. Y. Wan, H. J. Wang, M. N. Wen, R. Z. Wu, Y. Wen, Y. D. Yang, W. N. Zhang, J. Wang. MALib: A paraUel framework for population-based multi-agent reinforcement learning. [Online], Available: https://arxiv.org/abs/2106.07551, 2021. M. Jaderberg, W. M. Czarnecki, I. Dunning, L. Marris, G. Lever, A. G. Castaneda, C. Beattie, N. C. Rabinowitz, A. S. Morcos, A. Ruderman, N. Sonnerat, T. Green, L. Deason, J. Z. Leibo, D. Silver, D. Hassabis, K. Kavukcuoglu, T. Graepel. Human-level performance in first-person multiplayer games with population-based deep reinforcement learning. [Online], Available: https://arxiv.org/abs/1807.01281, 2018. J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart, D. Hassabis, T. Graepel, T. Lillicrap, D. Silver. Mastering Atari, Go, chess and Shogi by planning with a learned model. Nature, vol. 588, no. 7839, pp. 604–609, 2020. DOI: https://doi.org/10.1038/S41586-020-03051-4. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis. Mastering the game of Go with deep neural networks and tree search. Nature, vol. 529, no. 7587, pp. 484–489, 2016. DOI: https://doi.org/10.1038/nature16961. O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. L. Paine, C. Gulcehre, Z. Y. Wang, T. Pfaff, Y. H. Wu, R. Ring, D. Yogatama, D. Wünsch, K. McKinney, O. Smith, T. Schaul, T. Lillicrap, K. Kavukcuoglu, D. Hassabis, C. Apps, D. Silver. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, vol. 575, no. 7782, pp. 350–354, 2019. DOI: https://doi.org/10.1038/s41586-019-1724-z. J. Kober, J. A. Bagnell, J. Peters. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238–1274, 2013. DOI: https://doi.org/10.1177/0278364913495721. H. Cai, K. Ren, W. N. Zhang, K. Malialis, J. Wang, Y. Yu, D. F. Guo. Real-time bidding by reinforcement learning in display advertising. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining, ACM, Cambridge, UK, pp. 661–670, 2017. DOI: https://doi.org/10.1145/3018661.3018702. X. L. Huang, X. M. Ma, F. Hu. Editorial: Machine learning and intelligent communications. Mobile Networks and Apphcations, vol. 23, no. l, pp. 68–70, 2018. DOI: https://doi.org/10.1007/s11036-017-0962-2. R. S. Sutton, A. G. Barto. Reinforcement Learning: An Introduction, Cambridge, USA: MIT Press, 1998. C. J. C. H. Watkins, P. Dayan. Q-learning. Machine Learning, vol. 8, no. 3, pp. 279–292, 1992. DOI: https://doi.org/10.1007/BF00992698. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov. Proximal policy optimization algorithms. [Online], Available: https://arxiv.org/abs/1707.06347, 2017. F. Christianos, G. Papoudakis, A. Rahman, S. V. Albrecht. Scaling multi-agent reinforcement learning with selective parameter sharing. In Proceedings of the 38th International Conference on Machine Learning, pp. 1989–1998, 2021. Google Research. Google research football with Manchester City F.C.: A word from Manchester City F.C., [Online], Available: https://www.kaggle.com/c/google-football, 2020. Google Research. Google research football with Manchester City F.C.: WeKick: Temporary 1st place Solution, [Online], Available: https://www.kaggle.com/c/google-football/discussion/202232, 2020. Institute of Automation, Chinese Academy of Sciences. IEEE CoG 2022 footbaU AI competition, [Online], Available: http://www.jidiai.cn/compete_detail?compete=15, 2022. (in Chinese) C. H. Li, T. H. Wang, C. J. Wu, Q. C. Zhao, J. Yang, C. J. Zhang. Celebrating diversity in shared multi-agent reinforcement learning. In Proceedings of the 35th Neural Information Processing Systems, pp. 3991–4002, 2021. Y. R. Niu, R. Paleja, M. Gombolay. Multi-agent graph-attention communication and teaming. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, International Foundation for Autonomous Agents and Multiagent Systems, UK, pp.964-973, 2021. J. Roy, P. Barde, F. G. Harvey, D. Nowrouzezahrai, C. Pal. Promoting coordination through policy regularization in multi-agent deep reinforcement learning. In Proceedings of the 34th Neural Information Processing Systems, Vancouver, Canada, Article number. 1323, 2020. D. Yang, Y. H. Tang. Adaptive inner-reward shaping in sparse reward games. In Proceedings of International Joint Conference on Neural Networks, IEEE, Glasgow, UK, 2020. J. Q. Ruan, Y. L. Du, X. T. Xiong, D. P. Xing, X. Y. Li, L. H. Meng, H. F. Zhang, J. Wang, B. Xu. GCS: Graph-based coordination strategy for multi-agent reinforcement learning. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems, Auckland, New Zealand, pp. 1128–1136, 2022. L. Wang, Y. P. Zhang, Y. J. Hu, W. X. Wang, C. J. Zhang, Y. Gao, J. Y. Hao, T. J. Lv, C. J. Fan. Individual reward assisted multi-agent reinforcement learning. In Proceedings of the International Conference on Machine Learning, Baltimore, USA, pp. 23417–23432, 2022. H. Jiang, Y. T. Liu, S. Z. Li, J. Y. Zhang, X. H. Xu, D. H. Liu. Diverse effective relationship exploration for cooperative multi-agent reinforcement learning. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, ACM, Atlanta, USA, pp.842–851, 2022. DOI: https://doi.org/10.1145/3511808.3557292. Z. Q. Pu, H. M. Wang, B. Y. Liu, J. Q. Yi. Cognition-driven multi-agent policy learning framework for promoting cooperation. IEEE Transactions on Games, [Online], Available: https://ieeexplore.ieee.org/document/9807394. Y. Z. Niu, J. L. Liu, Y. H. Shi, J. R. Zhu. Graph neural network based agent in Google research football. [Online], Available: https://arxiv.org/abs/2204.11142, 2022. S. Y. Huang, W. Z. Chen, L. F. Zhang, S. Z. Xu, Z. Y. Li, F. M. Zhu, D. H. Ye, T. Chen, J. Zhu. TiKick: Towards playing multi-agent football full games from single-agent demonstrations. [Online], Available: https://arxiv.org/abs/2110.04507, 2021. J. Schulman, P. Moritz, S. Levine, M. I. Jordan, P. Abbeel. High-dimensional continuous control using generalized advantage estimation. In Proceedings of the 4th International Conference on Learning Representations, San Juan, Puerto Rico, 2016. D. P. Kingma, J. Ba. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, USA, 2015. L. C. Dinh, Y. D. Yang, S. McAleer, Z. Tian, N. P. Nieves, O. Slumbers, D. H. Mguni, H. B. Ammar, J. Wang. Online double oracle. [Online], Available: https://arxiv.org/abs/2103.07780, 2021. M. Zhou, J. X. Chen, Y. Wen, W. N. Zhang, Y. D. Yang, Y. Yu, J. Wang. Efficient policy space response oracles. [Online], Available: https://arxiv.org/abs/2202.00633, 2022. L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V. Mnih, T. Ward, Y. Doron, V. Firoiu, T. Harley, I. Dunning, S. Legg, K. Kavukcuoglu. IMPALA: Scalable distributed deep-RL with importance weighted actor-learner architectures. In Proceedings of the 35th International Conference on Machine Learning, Stockholmsmässan, Sweden, pp. 1406–1415, 2018. S. McAleer, J. Lanier, R. Fox, P. Baldi. Pipeline PSRO: A scalable approach for finding approximate nash equilibria in large games. In Proceedings of the 34th Neural Information Processing Systems, Vancouver, Canada, Article number 1699, 2020. J. Scheiermann, W. Konen. Alp ha Zero-inspired general board game learning and playing. [Online], Available: https://arxiv.org/abs/2204.13307v1, 2022. R. Sanjaya, J. Wang, Y. D. Yang. Measuring the non-transitivity in chess. Algorithms, vol.15, no. 5, Article number 152, 2022. DOI: https://doi.org/10.3390/a15050152. L. Han, J. C. Xiong, P. Sun, X. H. Sun, M. Fang, Q. W. Guo, Q. B. Chen, T. F. Shi, H. S. Yu, X. P. Wu, Z. Y. Zhang. TStarBot-X: An open-sourced and comprehensive study for efficient league training in StarCraft II full game. [Online], Available: https://arxiv.org/abs/2011.13729, 2020. D. Memmert, D. Raabe, S. Schwab, R. Rein. A tactical comparison of the 4-2-3-1 and 3-5-2 formation in soccer: A theory-oriented, experimental approach based on positional data in an 11 vs. 11 game set-up. PLoS One, vol.14, no. 1, Article number e0210191, 2019. DOI: https://doi.org/10.1371/journal.pone.0210191. B. Low, D. Coutinho, B. Gonçalves, R. Rein, D. Memmert, J. Sampaio. A systematic review of collective tactical behaviours in football using positional data. Sports Medicine, vol. 50, no. 2, pp. 343–385, 2020. DOI: https://doi.org/10.1007/s40279-019-01194-7. seungeunrho. Kaggle football competition 6th solution, [Online], Available: https://github.com/seungeunrho/football-paris, 2020.