A deep reinforcement learning approach for multi-agent mobile robot patrolling

Meghdeep Jana1, Leena Vachhani1, Arpita Sinha1
1Autonomous Robots and Multi-agent Systems Laboratory, Systems and Control Engineering, Indian Institute of Technology Bombay, Mumbai, India

Tóm tắt

Patrolling strategies primarily deal with minimising the time taken to visit specific locations and cover an area. The use of intelligent agents in patrolling has become beneficial in automation and analysing patterns in patrolling. However, practical scenarios demand these strategies to be adaptive in various conditions and robust against adversaries. Traditional Q-learning based patrolling keeps track of all possible states and actions in a Q-table, making them susceptible to the curse of dimensionality. For multi-agent patrolling to be adaptive in various scenarios represented using graphs, we propose a formulation of the Markov Decision Process (MDP) with state-representations that can be utilised for Deep Reinforcement Learning (DRL) approaches such as Deep Q-Networks (DQN). The implemented DQN can estimate the MDP using a finite length state vector trained with a novel reward function. Proposed state-space representation is independent of the number of nodes in the graph, thereby addressing scalability to graph dimensions. We also propose a reward function to penalise the agents for lack of global coordination while providing immediate local feedback on their actions. As independent policy learners subject to the MDP and reward function, the DRL agents formed a collaborative patrolling strategy. The policies learned by the agents generalise and adapt to multiple behaviours without explicit training or design to do so. We provide empirical analysis that shows the strategy’s adaptive capabilities with changes in agents’ position, non-uniform node visit frequency requirements, changes in a graph structure representing the environment, and induced randomness in the trajectories. DRL patrolling proves to be a promising patrolling strategy for intelligent agents by potentially being scalable, adaptive, and robust against adversaries.

Tài liệu tham khảo

Agmon, N., Kaminka, G.A., Kraus, S.: Multi-robot adversarial patrolling: facing a full-knowledge opponent. J, Artif. Intell. Res. 42(1), 887–916 (2011) Almeida, A., Ramalho, G., Santana, H., Tedesco, P., Menezes, T., Corruble, V., Chevaleyre, Y.: Recent advances on multi-agent patrolling. In: Proceedings of the 17th Brazilian Symposium on Artificial Intelligence, pp. 474–483. São Luis, Maranhão, Brazil (2004). https://doi.org/10.1007/978-3-540-28645-5_48 Baglietto, M., Cannata, G., Capezio, F., Sgorbissa, A.: Distributed Autonomous Robotic Systems 8, chap. Multi-Robot Uniform Frequency Coverage of Significant Locations in the Environment, pp. 3–14. Springer, Berlin, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00644-9_1 Chen, S., Wu, F., Shen, L., Chen, J., Ramchurn, S.D.: Multi-agent patrolling under uncertainty and threats. PLOS ONE 10(6), 1–19 (2015). https://doi.org/10.1371/journal.pone.0130154 Chevaleyre, Y., Sempe, F., Ramalho, G.: A theoretical analysis of multi-agent patrolling strategies. In: Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, pp. 1524–1525. New York, NY, USA (2004) Elmaliach, Y., Agmon, N., Kaminka, G.A.: Multi-robot area patrol under frequency constraints. Ann. Math. Artif. Intell. 57(3), 293–320 (2009) Elor, Y., Bruckstein, A.: Multi-a(ge)nt graph patrolling and partitioning. In: 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, vol. 2, pp. 52–57. Milan, Italy (2009). https://doi.org/10.1109/WI-IAT.2009.125 Hu, Z., Zhao, D.: Reinforcement learning for multi-agent patrol policy. In: Proceedings for 9th IEEE International Conference on Cognitive Informatics (ICCI’10), pp. 530–535. Beijing, China (2010) Krajzewicz, D., Hertkorn, G., Feld, C., Wagner, P.: Sumo (simulation of urban mobility); an open-source traffic simulation. pp. 183–187 (2002) Lauri, F., Koukam, A.: Robust multi-agent patrolling strategies using reinforcement learning. In: Siarry, P., Idoumghar, L., Lepagnot, J. (eds.) Swarm Intell. Based Optimiz., pp. 157–165. Mulhouse, France (2014) Li, L., Xu, Y., Yin, J., Liang, W., Li, X., Chen, W., Han, Z.: Deep reinforcement learning approaches for content caching in cache-enabled d2d networks. IEEE Internet of Things J. 7(1), 544–557 (2020) Luis, S.Y., Reina, D.G., Marín, S.L.T.: A multiagent deep reinforcement learning approach for path planning in autonomous surface vehicles: The ypacaraí lake patrolling case. IEEE Access 9(2021), 17084–17099 (2021). https://doi.org/10.1109/ACCESS.2021.3053348 Machado, A., Ramalho, G., Zucker, J.D., Drogoul, A.: Multi-agent patrolling: an empirical analysis of alternative architectures. In: Proceedings of the 3rd International Workshop on Multi-Agent Systems and Agent-Based Simulation, pp. 155–170. Bologna, Italy (2002). https://doi.org/10.1007/3-540-36483-8_11 Mao, T., Ray, L.E.: Frequency-based patrolling with heterogeneous agents and limited communication. CoRR arXiv:1402.1757 (2014) Marier, J.S., Besse, C., Chaib-draa, B.: Solving the continuous time multiagent patrol problem. In: 2010 IEEE International Conference on Robotics and Automation, pp. 941–946 (2010) Maza, I., Caballero, F., Capitán, J., de Dios, J.R.M., Ollero, A.: Experimental results in multi-uav coordination for disaster management and civil security applications. J. Intell. Robot. Syst. 61(1–4), 563–585 (2011). https://doi.org/10.1007/s10846-010-9497-5 Menezes, T., Tedesco, P., Ramalho, G.: Negotiator agents for the patrolling task. In: J.S. Sichman, H. Coelho, S.O. Rezende (eds.) Advances in Artificial Intelligence - IBERAMIA-SBIA 2006, pp. 48–57. Ribeirao Preto, Brazil (2006) Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.A.: Playing atari with deep reinforcement learning. CoRR arXiv:1312.5602 (2013) Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A., Veness, J., Bellemare, M., Graves, A., Riedmiller, M., Fidjeland, A., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518, 529–33 (2015). https://doi.org/10.1038/nature14236 Nguyen, T.T., Nguyen, N.D., Nahavandi, S.: Deep reinforcement learning for multi-agent systems: A review of challenges, solutions and applications. CoRR arXiv:1812.11794 (2018) Piciarelli, C., Foresti, G.L.: Drone patrolling with reinforcement learning. In: Proceedings of the 13th International Conference on Distributed Smart Cameras, pp. 1–6. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3349801.3349805 Portugal, D., Rocha, R.: Msp algorithm: Multi-robot patrolling based on territory allocation using balanced graph partitioning. In: Proceedings of the ACM Symposium on Applied Computing, pp. 1271–1276. Sierre, Switzerland (2010). https://doi.org/10.1145/1774088.1774360 Portugal, D., Rocha, R.: A survey on multi-robot patrolling algorithms. In: Camarinha-Matos, L.M. (ed.) Technological Innovation for Sustainability, Doctoral Conference on Computing, Electrical and Industrial Systems, DoCEIS, pp. 139–146. Costa de Caparica, Portugal (2011) Portugal, D., Rocha, R.P.: Cooperative Multi-robot Patrol in an Indoor Infrastructure, pp. 339–358. Springer International Publishing, Cham (2014). https://doi.org/10.1007/978-3-319-10807-0_16 Santana, H., Ramalho, G., Corruble, V., Ratitch, B.: Multi-agent patrolling with reinforcement learning. In: Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, pp. 1122–1129. IEEE, New York, NY, USA (2004) Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, New York, NY (2018) Walsh, T., Nouri, A., Li, L., Littman, M.: Learning and planning in environments with delayed feedback. Auton. Agents Multi-Agent Syst. 18, 83–105 (2008). https://doi.org/10.1007/s10458-008-9056-7 Wiandt, B., Simon, V.: Autonomous graph partitioning for multi-agent patrolling problems. In: Proceedings of Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 261–268. Poznan, Poland (2018)