An Empirical Study on Google Research Football Multi-agent Scenarios

Springer Science and Business Media LLC - Trang 1-22 - 2024

Yan Song¹, He Jiang², Zheng Tian³, Haifeng Zhang¹, Yingping Zhang⁴, Jiangcheng Zhu⁴, Zonghong Dai⁴, Weinan Zhang⁵, Jun Wang⁶

¹Institute of Automation, Chinese Academy of Sciences, Beijing, China

²Digital Brain Lab, Shanghai, China

³ShanghaiTech University, Shanghai, China

⁴Huawei Cloud, Guiyang, China

⁵Shanghai Jiao Tong University, Shanghai, China

⁶University College London, London, UK

Tóm tắt

Few multi-agent reinforcement learning (MARL) researches on Google research football (GRF) focus on the 11-vs-11 multi-agent full-game scenario and to the best of our knowledge, no open benchmark on this scenario has been released to the public. In this work, we fill the gap by providing a population-based MARL training pipeline and hyperparameter settings on multi-agent football scenario that outperforms the bot with difficulty 1.0 from scratch within 2 million steps. Our experiments serve as a reference for the expected performance of independent proximal policy optimization (IPPO), a state-of-the-art multi-agent reinforcement learning algorithm where each agent tries to maximize its own policy independently across various training configurations. Meanwhile, we release our training framework Light-MALib which extends the MALib codebase by distributed and asynchronous implementation with additional analytical tools for football games. Finally, we provide guidance for building strong football AI with population-based training and release diverse pretrained policies for benchmarking. The goal is to provide the community with a head start for whoever experiment their works on GRF and a simple-to-use population-based training framework for further improving their agents through self-play. The implementation is available at https://github.com/Shanghai-Digital-Brain-Laboratory/DB-Football .

Tài liệu tham khảo

K. Kurach, A. Raichuk, P. Stańczyk, M. Zając, O. Bachem, L. Espeholt, C. Riquehne, D. Vincent, M. Michalski, O. Bousquet, S. Gelly. Google research football: A novel reinforcement learning environment. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, USA, pp.4501-4510, 2020. DOI: https://doi.org/10.1609/aaai.v34i04.5878. C. S. de Witt, T. Gupta, D. Makoviichuk, V. Makoviychuk, P. H. S. Torr, M. F. Sun, S. Whiteson. Is independent learning all you need in the StarCraft multi-agent challenge? [Online], Available: https://arxiv.org/abs/2011.09533, 2020. M. Zhou, Z. Y. Wan, H. J. Wang, M. N. Wen, R. Z. Wu, Y. Wen, Y. D. Yang, W. N. Zhang, J. Wang. MALib: A paraUel framework for population-based multi-agent reinforcement learning. [Online], Available: https://arxiv.org/abs/2106.07551, 2021. M. Jaderberg, W. M. Czarnecki, I. Dunning, L. Marris, G. Lever, A. G. Castaneda, C. Beattie, N. C. Rabinowitz, A. S. Morcos, A. Ruderman, N. Sonnerat, T. Green, L. Deason, J. Z. Leibo, D. Silver, D. Hassabis, K. Kavukcuoglu, T. Graepel. Human-level performance in first-person multiplayer games with population-based deep reinforcement learning. [Online], Available: https://arxiv.org/abs/1807.01281, 2018. J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart, D. Hassabis, T. Graepel, T. Lillicrap, D. Silver. Mastering Atari, Go, chess and Shogi by planning with a learned model. Nature, vol. 588, no. 7839, pp. 604–609, 2020. DOI: https://doi.org/10.1038/S41586-020-03051-4. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis. Mastering the game of Go with deep neural networks and tree search. Nature, vol. 529, no. 7587, pp. 484–489, 2016. DOI: https://doi.org/10.1038/nature16961. O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. L. Paine, C. Gulcehre, Z. Y. Wang, T. Pfaff, Y. H. Wu, R. Ring, D. Yogatama, D. Wünsch, K. McKinney, O. Smith, T. Schaul, T. Lillicrap, K. Kavukcuoglu, D. Hassabis, C. Apps, D. Silver. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, vol. 575, no. 7782, pp. 350–354, 2019. DOI: https://doi.org/10.1038/s41586-019-1724-z. J. Kober, J. A. Bagnell, J. Peters. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238–1274, 2013. DOI: https://doi.org/10.1177/0278364913495721. H. Cai, K. Ren, W. N. Zhang, K. Malialis, J. Wang, Y. Yu, D. F. Guo. Real-time bidding by reinforcement learning in display advertising. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining, ACM, Cambridge, UK, pp. 661–670, 2017. DOI: https://doi.org/10.1145/3018661.3018702. X. L. Huang, X. M. Ma, F. Hu. Editorial: Machine learning and intelligent communications. Mobile Networks and Apphcations, vol. 23, no. l, pp. 68–70, 2018. DOI: https://doi.org/10.1007/s11036-017-0962-2. R. S. Sutton, A. G. Barto. Reinforcement Learning: An Introduction, Cambridge, USA: MIT Press, 1998. C. J. C. H. Watkins, P. Dayan. Q-learning. Machine Learning, vol. 8, no. 3, pp. 279–292, 1992. DOI: https://doi.org/10.1007/BF00992698. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov. Proximal policy optimization algorithms. [Online], Available: https://arxiv.org/abs/1707.06347, 2017. F. Christianos, G. Papoudakis, A. Rahman, S. V. Albrecht. Scaling multi-agent reinforcement learning with selective parameter sharing. In Proceedings of the 38th International Conference on Machine Learning, pp. 1989–1998, 2021. Google Research. Google research football with Manchester City F.C.: A word from Manchester City F.C., [Online], Available: https://www.kaggle.com/c/google-football, 2020. Google Research. Google research football with Manchester City F.C.: WeKick: Temporary 1st place Solution, [Online], Available: https://www.kaggle.com/c/google-football/discussion/202232, 2020. Institute of Automation, Chinese Academy of Sciences. IEEE CoG 2022 footbaU AI competition, [Online], Available: http://www.jidiai.cn/compete_detail?compete=15, 2022. (in Chinese) C. H. Li, T. H. Wang, C. J. Wu, Q. C. Zhao, J. Yang, C. J. Zhang. Celebrating diversity in shared multi-agent reinforcement learning. In Proceedings of the 35th Neural Information Processing Systems, pp. 3991–4002, 2021. Y. R. Niu, R. Paleja, M. Gombolay. Multi-agent graph-attention communication and teaming. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, International Foundation for Autonomous Agents and Multiagent Systems, UK, pp.964-973, 2021. J. Roy, P. Barde, F. G. Harvey, D. Nowrouzezahrai, C. Pal. Promoting coordination through policy regularization in multi-agent deep reinforcement learning. In Proceedings of the 34th Neural Information Processing Systems, Vancouver, Canada, Article number. 1323, 2020. D. Yang, Y. H. Tang. Adaptive inner-reward shaping in sparse reward games. In Proceedings of International Joint Conference on Neural Networks, IEEE, Glasgow, UK, 2020. J. Q. Ruan, Y. L. Du, X. T. Xiong, D. P. Xing, X. Y. Li, L. H. Meng, H. F. Zhang, J. Wang, B. Xu. GCS: Graph-based coordination strategy for multi-agent reinforcement learning. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems, Auckland, New Zealand, pp. 1128–1136, 2022. L. Wang, Y. P. Zhang, Y. J. Hu, W. X. Wang, C. J. Zhang, Y. Gao, J. Y. Hao, T. J. Lv, C. J. Fan. Individual reward assisted multi-agent reinforcement learning. In Proceedings of the International Conference on Machine Learning, Baltimore, USA, pp. 23417–23432, 2022. H. Jiang, Y. T. Liu, S. Z. Li, J. Y. Zhang, X. H. Xu, D. H. Liu. Diverse effective relationship exploration for cooperative multi-agent reinforcement learning. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, ACM, Atlanta, USA, pp.842–851, 2022. DOI: https://doi.org/10.1145/3511808.3557292. Z. Q. Pu, H. M. Wang, B. Y. Liu, J. Q. Yi. Cognition-driven multi-agent policy learning framework for promoting cooperation. IEEE Transactions on Games, [Online], Available: https://ieeexplore.ieee.org/document/9807394. Y. Z. Niu, J. L. Liu, Y. H. Shi, J. R. Zhu. Graph neural network based agent in Google research football. [Online], Available: https://arxiv.org/abs/2204.11142, 2022. S. Y. Huang, W. Z. Chen, L. F. Zhang, S. Z. Xu, Z. Y. Li, F. M. Zhu, D. H. Ye, T. Chen, J. Zhu. TiKick: Towards playing multi-agent football full games from single-agent demonstrations. [Online], Available: https://arxiv.org/abs/2110.04507, 2021. J. Schulman, P. Moritz, S. Levine, M. I. Jordan, P. Abbeel. High-dimensional continuous control using generalized advantage estimation. In Proceedings of the 4th International Conference on Learning Representations, San Juan, Puerto Rico, 2016. D. P. Kingma, J. Ba. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, USA, 2015. L. C. Dinh, Y. D. Yang, S. McAleer, Z. Tian, N. P. Nieves, O. Slumbers, D. H. Mguni, H. B. Ammar, J. Wang. Online double oracle. [Online], Available: https://arxiv.org/abs/2103.07780, 2021. M. Zhou, J. X. Chen, Y. Wen, W. N. Zhang, Y. D. Yang, Y. Yu, J. Wang. Efficient policy space response oracles. [Online], Available: https://arxiv.org/abs/2202.00633, 2022. L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V. Mnih, T. Ward, Y. Doron, V. Firoiu, T. Harley, I. Dunning, S. Legg, K. Kavukcuoglu. IMPALA: Scalable distributed deep-RL with importance weighted actor-learner architectures. In Proceedings of the 35th International Conference on Machine Learning, Stockholmsmässan, Sweden, pp. 1406–1415, 2018. S. McAleer, J. Lanier, R. Fox, P. Baldi. Pipeline PSRO: A scalable approach for finding approximate nash equilibria in large games. In Proceedings of the 34th Neural Information Processing Systems, Vancouver, Canada, Article number 1699, 2020. J. Scheiermann, W. Konen. Alp ha Zero-inspired general board game learning and playing. [Online], Available: https://arxiv.org/abs/2204.13307v1, 2022. R. Sanjaya, J. Wang, Y. D. Yang. Measuring the non-transitivity in chess. Algorithms, vol.15, no. 5, Article number 152, 2022. DOI: https://doi.org/10.3390/a15050152. L. Han, J. C. Xiong, P. Sun, X. H. Sun, M. Fang, Q. W. Guo, Q. B. Chen, T. F. Shi, H. S. Yu, X. P. Wu, Z. Y. Zhang. TStarBot-X: An open-sourced and comprehensive study for efficient league training in StarCraft II full game. [Online], Available: https://arxiv.org/abs/2011.13729, 2020. D. Memmert, D. Raabe, S. Schwab, R. Rein. A tactical comparison of the 4-2-3-1 and 3-5-2 formation in soccer: A theory-oriented, experimental approach based on positional data in an 11 vs. 11 game set-up. PLoS One, vol.14, no. 1, Article number e0210191, 2019. DOI: https://doi.org/10.1371/journal.pone.0210191. B. Low, D. Coutinho, B. Gonçalves, R. Rein, D. Memmert, J. Sampaio. A systematic review of collective tactical behaviours in football using positional data. Sports Medicine, vol. 50, no. 2, pp. 343–385, 2020. DOI: https://doi.org/10.1007/s40279-019-01194-7. seungeunrho. Kaggle football competition 6th solution, [Online], Available: https://github.com/seungeunrho/football-paris, 2020.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA