Routing optimization with Monte Carlo Tree Search-based multi-agent reinforcement learning

Springer Science and Business Media LLC - Tập 53 - Trang 25881-25896 - 2023
Qi Wang1, Yongsheng Hao2,3
1Information Science and Technology College, Dalian Maritime University, Dalian, China
2School of Mathematics and Statistics, Nanjing University of Information Science & Technology, Nanjing, China
3Network Center, Nanjing University of Information Science & Technology, Nanjing, China

Tóm tắt

Vehicle routing (VRP) and traveling salesman problems (TSP) are classical and interesting NP-hard routing combinatorial optimization (CO) with practical significance. While moving forward with artificial intelligence, researchers are paying more and more attention to applying machine learning to classical CO problems. However, traditional reinforcement learning faces challenges like reward sparsity and unstable training, so it is necessary to assist agents in finding high-quality routings in the initial model training stage to obtain more positive feedback. This paper proposes a novel Monte Carlo Tree Search (MCTS)-based two-stage multi-agent reinforcement learning training pipeline (MCRL) in which we also design a multifunctional reward function, improving efficiency, accuracy, and diversity to guide agents to learn the routings over graphs better. Besides, previous approaches are frequently too sluggish in runtime to be useful in contexts with sparsely connected networks and uncertain traffic. As an alternative, we design a model based on graph neural networks that can execute multi-agent routing in a sparsely connected graph with constantly changing traffic circumstances. Also, the agents are better equipped to collaborate online and adjust to changes thanks to our learned communication module.

Tài liệu tham khảo

Castro-Gutierrez J, Landa-Silva D, Moreno Pérez J (2011) Nature of real-world multi-objective vehicle routing with evolutionary algorithms. Conf. Proc. - IEEE Int. Conf. Syst. Man Cybern, pp 257–264. https://doi.org/10.1109/ICSMC.2011.6083675. Lu H, Zhou R, Fei Z, Guan C (2019) Spatial-domain fitness landscape analysis for combinatorial optimization. Inf Sci (Ny) 472:126–144. https://doi.org/10.1016/j.ins.2018.09.019 Niu Y, Shao J, Xiao J, Song W, Cao Z (2022) Multi-objective evolutionary algorithm based on RBF network for solving the stochastic vehicle routing problem. Inf Sci (Ny) 609:387–410. https://doi.org/10.1016/j.ins.2022.07.087 Chen C, Demir E, Huang Y (2021) An adaptive large neighborhood search heuristic for the vehicle routing problem with time windows and delivery robots. Eur J Oper Res 294:1164–1180. https://doi.org/10.1016/j.ejor.2021.02.027 Windras Mara ST, Norcahyo R, Jodiawan P, Lusiantoro L, Rifai AP (2022) A survey of adaptive large neighborhood search algorithms and applications. Comput Oper Res 146:105903. https://doi.org/10.1016/j.cor.2022.105903 Wang Q, Tang C (2021) Deep reinforcement learning for transportation network combinatorial optimization: a survey. Knowl Based Syst 233:107526. https://doi.org/10.1016/j.knosys.2021.107526 Ecoffet A, Huizinga J, Lehman J, Stanley KO, Clune J (2021) First return, then explore. Nature 590:580–586. https://doi.org/10.1038/s41586-020-03157-9 Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, Lillicrap T, Simonyan K, Hassabis D (2018) A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science (80-. ) 362:1140–1144. https://doi.org/10.1126/science.aar6404 Koster R, Balaguer J, Tacchetti A, Weinstein A, Zhu T, Hauser O, Williams D, Campbell-Gillingham L, Thacker P, Botvinick M, Summerfield C (2022) Human-centred mechanism design with democratic AI. Nat Hum Behav 6:1398–1407. https://doi.org/10.1038/s41562-022-01383-x Fawzi A, Balog M, Huang A, Hubert T, Romera-Paredes B, Barekatain M, Novikov A, Francisco FJ, Schrittwieser J, Swirszcz G, Silver D, Hassabis D, Kohli P (2022) Discovering faster matrix multiplication algorithms with reinforcement learning. Nature 610:47–53. https://doi.org/10.1038/s41586-022-05172-4 Wang Q, He Y, Tang C (2022) Mastering construction heuristics with self-play deep reinforcement learning. Neural Comput Appl 35(6):4723–4738. https://doi.org/10.1007/s00521-022-07989-6 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5999–6009 Vinyals O, Fortunato M, Jaitly N (2015) Pointer networks. In: Advances in neural information processing systems, pp 2692–2700 Nazari M, Oroojlooy A, Takáč M, Snyder LV (2018) Reinforcement learning for solving the vehicle routing problem. In: Advances in neural information processing systems, pp 9839–9849 Kool W, Van Hoof H, Welling M (2019) Attention, learn to solve routing problems! In: 7th international conference on learning representations, ICLR 2019, pp 1–25 Wu Z, Pan S, Chen F, Long G, Zhang C, Yu PS (2021) A comprehensive survey on graph neural networks. IEEE Trans Neural Netw Learn Syst 32:4–24. https://doi.org/10.1109/TNNLS.2020.2978386 Wang Q, Lai KH, Tang C (2023) Solving combinatorial optimization problems over graphs with BERT-based deep reinforcement learning. Inf Sci (Ny) 619:930–946. https://doi.org/10.1016/j.ins.2022.11.073 Wang Q, Hao Y, Cao J (2021) Learning to traverse over graphs with a Monte Carlo tree search-based self-play framework. Eng Appl Artif Intell 105:104422. https://doi.org/10.1016/j.engappai.2021.104422 Wang Q (2021) VARL: a variational autoencoder-based reinforcement learning framework for vehicle routing problems. Appl Intell 52:8910–8923. https://doi.org/10.1007/s10489-021-02920-3 Dai H, Khalil EB, Zhang Y, Dilkina B, Song L (2017) Learning combinatorial optimization algorithms over graphs. In: Advances in neural information processing systems, pp 6349–6359 Bello I, Pham H, Le QV, Norouzi M, Bengio S (2019) Neural combinatorial optimization with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017 - workshop track proceedings, pp 1–15 Li Z, Chen Q, Koltun V (2018) Combinatorial optimization with graph convolutional networks and guided tree search. In: Advances in neural information processing systems, pp 539–548 Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529:484–489. https://doi.org/10.1038/nature16961 Xin L, Song W, Cao Z, Zhang J (2021) Multi-decoder attention model with embedding glimpse for solving vehicle routing problems. In: 35th AAAI conference on artificial intelligence, AAAI 2021, pp 12042–12049. https://doi.org/10.1609/aaai.v35i13.17430 Chen X, Tian Y (2019) Learning to perform local rewriting for combinatorial optimization. In: Advances in neural information processing systems Lu H, Zhang X, Yang S (2018) A learning-based iterative method for solving vehicle routing problems. Iclr 2020. 3, pp 1–13 Zheng J, He K, Zhou J, Jin Y, Li CM (2021) Combining reinforcement learning with Lin-Kernighan-Helsgaun algorithm for the traveling salesman problem. In: 35th AAAI conference on artificial intelligence, AAAI 2021, pp 12445–12452. https://doi.org/10.1609/aaai.v35i14.17476 Delarue A, Anderson R, Tjandraatmadja C (2020) Reinforcement learning with combinatorial actions: an application to vehicle routing. Adv. Neural Inf. Process. Syst. 2020-Decem Cappart Q, Moisan T, Rousseau LM, Prémont-Schwarz I, Cire AA (2021) Combining reinforcement learning and constraint programming for combinatorial optimization. In: 35th AAAI Conference on Artificial Intelligence, AAAI 2021, pp 3677–3687. https://doi.org/10.1609/aaai.v35i5.16484 Zong Z, Wang H, Wang J, Zheng M, Li Y (2022) RBG: hierarchically solving large-scale routing problems in logistic systems via reinforcement learning. Association for Computing Machinery. https://doi.org/10.1145/3534678.3539037 Yan D, Weng J, Huang S, Li C, Zhou Y, Su H, Zhu J (2022) Deep reinforcement learning with credit assignment for combinatorial optimization. Pattern Recognit 124:108466. https://doi.org/10.1016/j.patcog.2021.108466 Wang Q, Blackley SV, Tang C (2022) Generative adversarial imitation learning to search in branch-and-bound algorithms. In: International conference on database systems for advanced applications. Springer International Publishing, pp 673–680. https://doi.org/10.1007/978-3-031-00126-0_51 Hottung A, Kwon Y-D, Tierney K (2022) Efficient active search for combinatorial optimization problems. Iclr 2022, pp 1–10 Cai X, Xia C, Zhang Q, Mei Z, Hu H, Wang L, Hu J (2021) The collaborative local search based on dynamic-constrained decomposition with grids for combinatorial multiobjective optimization. IEEE Trans Cybern 51:2639–2650. https://doi.org/10.1109/TCYB.2019.2931434 Domínguez-Ríos MÁ, Chicano F, Alba E (2021) Effective anytime algorithm for multiobjective combinatorial optimization problems. Inf Sci (Ny) 565:210–228. https://doi.org/10.1016/j.ins.2021.02.074 Yu JJQ, Yu W, Gu J (2019) Online vehicle routing with neural combinatorial optimization and deep reinforcement learning. IEEE Trans Intell Transp Syst 20:3806–3817. https://doi.org/10.1109/TITS.2019.2909109 Yin F, Zhao Y (2022) Distributionally robust equilibrious hybrid vehicle routing problem under twofold uncertainty. Inf Sci (Ny) 609:1239–1255. https://doi.org/10.1016/j.ins.2022.07.140 Zeng, H., Zhou, H., Srivastava, A., Kannan, R., Prasanna, V. (2020) GraphSAINT: Graph Sampling Based Inductive Learning Method. Iclr. 3, pp 415–422 Menda K, Chen YC, Grana J, Bono JW, Tracey BD, Kochenderfer MJ, Wolpert D (2019) Deep reinforcement learning for event-driven multi-agent decision processes. IEEE Trans Intell Transp Syst 20:1259–1268. https://doi.org/10.1109/TITS.2018.2848264 Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P, Oh J, Horgan D, Kroiss M, Danihelka I, Huang A, Sifre L, Cai T, Agapiou JP, Jaderberg M, Vezhnevets AS, Leblond R, Pohlen T, Dalibard V, Budden D, Sulsky Y, Molloy J, Paine TL, Gulcehre C, Wang Z, Pfaff T, Wu Y, Ring R, Yogatama D, Wünsch D, McKinney K, Smith O, Schaul T, Lillicrap T, Kavukcuoglu K, Hassabis D, Apps C, Silver D (2019) Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575:350–354. https://doi.org/10.1038/s41586-019-1724-z Liu A, Chen J, Yu M, Zhai Y, Zhou X, Liu J (2020) Watch the unobserved: a simple approach to parallelizing Monte Carlo Tree Search. Iclr, pp 1–21 Wang J, Wu N, Zhao WX, Peng F, Lin X (2019) Empowering A* search algorithms with neural networks for personalized route recommendation. Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min, pp 539–547. https://doi.org/10.1145/3292500.3330824 Xin L, Song W, Cao Z, Zhang J (2021) Step-wise deep learning models for solving routing problems. IEEE Trans Ind Inform 17:4861–4871. https://doi.org/10.1109/TII.2020.3031409 Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30 Wang H, Yu D, Li Y, Li Z, Wang G (2018) Multi-label online streaming feature selection based on spectral granulation and mutual information. Springer International Publishing.https://doi.org/10.1007/978-3-319-99368-3_17