Phương pháp học tăng cường đa tác nhân với chuyển giao chương trình cho điều khiển tín hiệu giao thông động quy mô lớn

Springer Science and Business Media LLC - Tập 53 - Trang 21433-21447 - 2023
Xuesi Li1, Jingchen Li1, Haobin Shi1
1School of Computer Science, Northwestern Polytechnical University, Xi’an, China

Tóm tắt

Việc sử dụng học tăng cường để kiểm soát hệ thống tín hiệu giao thông đã được thảo luận trong những năm gần đây, nhưng hầu hết các công trình đều tập trung vào các tình huống đơn giản như một điểm giao cắt đơn, và các phương pháp hướng đến các tình huống giao thông quy mô lớn gặp khó khăn trong việc huấn luyện lâu dài và kết quả không tối ưu. Trong công trình này, chúng tôi phát triển một mô hình học tăng cường đa tác nhân mới cho các nhiệm vụ điều khiển tín hiệu giao thông quy mô lớn, và một phương pháp chuyển giao chương trình học được phát triển để tối ưu hóa chính sách chung từng bước một. Các chính sách cho các nút giao thông khác nhau được huấn luyện trong một quy trình quyết định Markov quan sát một phần với cơ chế huấn luyện tập trung và thực hiện phi tập trung, và chúng tôi thiết kế các mô-đun transformer cho cả mạng chính sách và mạng đánh giá bằng cơ chế chú ý. Chúng tôi trước tiên huấn luyện các chính sách trong một tình huống giao thông đơn giản, và sau đó các chính sách này được chuyển giao đến chương trình tiếp theo thông qua việc nạp lại chính sách, trong khi các kinh nghiệm từ nhiệm vụ nguồn được tái sử dụng theo cách chọn lọc. Khi số lượng tác nhân tăng lên, phương pháp của chúng tôi có thể đạt được hiệu suất thỏa đáng nhanh chóng nhờ vào việc tái sử dụng kiến thức từ các chương trình trước đó. Chúng tôi thực hiện một số thí nghiệm trên nền tảng Cityflow. Trong trường hợp có hơn 10 điểm giao cắt, mô hình của chúng tôi cải thiện phần thưởng trung bình từ 3.0 lên 5.0.

Từ khóa

#học tăng cường #kiểm soát tín hiệu giao thông #mô hình đa tác nhân #chuyển giao chương trình #quy trình quyết định Markov.

Tài liệu tham khảo

Xie Y., Dibangoye J., Buffet O.: Optimally solving two-agent decentralized pomdps under one-sided information sharing. In: International Conference on Machine Learning, pp. 10473–10482 (2020). PMLR Voelker A., Kajić I., Eliasmith C.: Legendre memory units: Continuous-time representation in recurrent neural networks. Advances in neural information processing systems 32 (2019) Yang Y., Luo R., Li M., Zhou M., Zhang W., Wang J.: Mean field multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 5571–5580 (2018). PMLR Zhu K., Liu M., Chen H., Zhao Z., Pan D.Z.: Exploring logic optimizations with reinforcement learning and graph convolutional network. In: 2020 ACM/IEEE 2nd Workshop on Machine Learning for CAD (MLCAD), pp. 145–150 (2020). IEEE Wei H, Zheng G, Gayah V, Li Z (2021) Recent advances in reinforcement learning for traffic signal control: A survey of models and evaluation. ACM SIGKDD Explorations Newsletter 22(2):12–18 Shamsoshoara A., Khaledi M., Afghah F., Razi A., Ashdown J.: Distributed cooperative spectrum sharing in uav networks using multi-agent reinforcement learning. In: 2019 16th IEEE Annual Consumer Communications & Networking Conference (CCNC), pp. 1–6 (2019). IEEE Zheng G., Xiong Y., Zang X., Feng J., Wei H., Zhang H., Li Y., Xu K., Li Z.: Learning phase competition for traffic signal control. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1963–1972 (2019) Andriotis C, Papakonstantinou K (2019) Managing engineering systems with large state and action spaces through deep reinforcement learning. Reliability Engineering & System Safety 191:106483 Wang T, Cao J, Hussain A (2021) Adaptive traffic signal control for large-scale scenario with cooperative group-based multi-agent reinforcement learning. Transportation research part C: emerging technologies 125:103046 Li J, Shi H, Hwang K-S (2021) An explainable ensemble feedforward method with gaussian convolutional filter. Knowledge-Based Systems 225:107103 Kim YG, Lee S, Son J, Bae H, Do Chung B (2020) Multi-agent system and reinforcement learning approach for distributed intelligence in a flexible smart manufacturing system. Journal of Manufacturing Systems 57:440–450 Zhang K., Yang Z., Başar T.: Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of Reinforcement Learning and Control, 321–384 (2021) Du C, Huang L (2018) Text classification research with attention-based recurrent neural networks. International Journal of Computers Communications & Control 13(1):50–61 Wang L., Zhang Y., Hu Y., Wang W., Zhang C., Gao Y., Hao J., Lv T., Fan C.: Individual reward assisted multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 23417–23432 (2022). PMLR Liu X, Yu J, Feng Z, Gao Y (2020) Multi-agent reinforcement learning for resource allocation in iot networks with edge computing. China Communications 17(9):220–236 Shi H, Shi L, Xu M, Hwang K-S (2019) End-to-end navigation strategy with deep reinforcement learning for mobile robots. IEEE Transactions on Industrial Informatics 16(4):2393–2402 Mu R., Wei A., Li H., Wang Z.-M.: Distributed adaptive fault-tolerant consensus control for multi-agent systems with event-triggered communication. International Journal of Systems Science, 1–15 (2022) Zuo S, Song Y, Lewis FL, Davoudi A (2017) Output containment control of linear heterogeneous multi-agent systems using internal model principle. IEEE transactions on cybernetics 47(8):2099–2109 Qian Y-Y, Liu L, Feng G (2018) Distributed event-triggered adaptive control for consensus of linear multi-agent systems with external disturbances. IEEE transactions on cybernetics 50(5):2197–2208 Leibo J.Z., Dueñez-Guzman E.A., Vezhnevets A., Agapiou J.P., Sunehag P., Koster R., Matyas J., Beattie C., Mordatch I., Graepel T.: Scalable evaluation of multi-agent reinforcement learning with melting pot. In: International Conference on Machine Learning, pp. 6187–6199 (2021). PMLR Lin T, Huh J, Stauffer C, Lim SN, Isola P (2021) Learning to ground multi-agent communication with autoencoders. Advances in Neural Information Processing Systems 34:15230–15242 Wang G., Shi D., Xue C., Jiang H., Wang Y.: Bic-ddpg: Bidirectionally-coordinated nets for deep multi-agent reinforcement learning. In: International Conference on Collaborative Computing: Networking, Applications and Worksharing, pp. 337–354 (2021). Springer Arel I, Liu C, Urbanik T, Kohls AG (2010) Reinforcement learning-based multi-agent system for network traffic signal control. IET Intelligent Transport Systems 4(2):128–135 Lowe R., Wu Y., Tamar A., Harb J., Abbeel P., Mordatch I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6382–6393 (2017) Mao W., Zhang K., Miehling E., Başar T.: Information state embedding in partially observable cooperative multi-agent reinforcement learning. In: 2020 59th IEEE Conference on Decision and Control (CDC), pp. 6124–6131 (2020). IEEE Geyer F., Carle G.: Learning and generating distributed routing protocols using graph-based deep learning. In: Proceedings of the 2018 Workshop on Big Data Analytics and Machine Learning for Data Communication Networks, pp. 40–45 (2018) Das A., Gervet T., Romoff J., Batra D., Parikh D., Rabbat M., Pineau J.: Tarmac: Targeted multi-agent communication. In: International Conference on Machine Learning, pp. 1538–1546 (2019). PMLR Wei H., Xu N., Zhang H., Zheng G., Zang X., Chen C., Zhang W., Zhu Y., Xu K., Li Z.: Colight: Learning network-level cooperation for traffic signal control. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1913–1922 (2019) Liu Y.-C., Tian J., Glaser N., Kira Z.: When2com: Multi-agent perception via communication graph grouping. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4106–4115 (2020) Iqbal S., Sha F.: Actor-attention-critic for multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 2961–2970 (2019). PMLR Casas N.: Deep deterministic policy gradient for urban traffic light control. arXiv preprint http://arxiv.org/abs/1703.09035arXiv:1703.09035 (2017) Chen C., Wei H., Xu N., Zheng G., Yang M., Xiong Y., Xu K., Li Z.: Toward a thousand lights: Decentralized deep reinforcement learning for large-scale traffic signal control. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 3414–3421 (2020) Chu T, Wang J, Codecà L, Li Z (2019) Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Transactions on Intelligent Transportation Systems 21(3):1086–1095 Mannion P., Duggan J., Howley E.: An experimental review of reinforcement learning algorithms for adaptive traffic signal control. Autonomic road transport support systems, 47–66 (2016) Mousavi SS, Schukat M, Howley E (2017) Traffic light control using deep policy-gradient and value-function-based reinforcement learning. IET Intelligent Transport Systems 11(7):417-423 Nishi T., Otaki K., Hayakawa K., Yoshimura T.: Traffic signal control based on reinforcement learning with graph convolutional neural nets. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pp. 877–883 (2018). IEEE Prashanth L., Bhatnagar S.: Reinforcement learning with average cost for adaptive control of traffic lights at intersections. In: 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC), pp. 1640–1645 (2011). IEEE Rizzo S.G., Vantini G., Chawla S.: Time critic policy gradient methods for traffic signal control in complex and congested scenarios. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1654–1664 (2019) Wang Y, Xu T, Niu X, Tan C, Chen E, Xiong H (2022) Stmarl: A spatio-temporal multi-agent reinforcement learning approach for cooperative traffic light control. IEEE Transactions on Mobile Computing 21(06):2228–2242 Wei H., Zheng G., Yao H., Li Z.: Intellilight: A reinforcement learning approach for intelligent traffic light control. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2496–2505 (2018) Wei H., Chen C., Zheng G., Wu K., Gayah V., Xu K., Li Z.: Presslight: Learning max pressure control to coordinate traffic signals in arterial network. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1290–1298 (2019) Wang X, Ke L, Qiao Z, Chai X (2020) Large-scale traffic signal control using a novel multiagent reinforcement learning. IEEE transactions on cybernetics 51(1):174-187 Zhang H., Feng S., Liu C., Ding Y., Zhu Y., Zhou Z., Zhang W., Yu Y., Jin H., Li Z.: Cityflow: A multi-agent reinforcement learning environment for large scale city traffic scenario. In: The World Wide Web Conference, pp. 3620–3624 (2019) Du W, Ding S (2021) A survey on multi-agent deep reinforcement learning: from the perspective of challenges and applications. Artificial Intelligence Review 54(5):3215–3238 Shin J, Badgwell TA, Liu K-H, Lee JH (2019) Reinforcement learning-overview of recent progress and implications for process control. Computers & Chemical Engineering 127:282-294