Sequencing of multi-robot behaviors using reinforcement learning

Control Theory and Technology - Tập 19 - Trang 529-537 - 2021
Pietro Pierpaoli1, Thinh T. Doan2, Justin Romberg1, Magnus Egerstedt3
1[school of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, USA]
2Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, USA
3Samueli School of Engineering, University of California, Irvine, USA

Tóm tắt

Given a collection of parameterized multi-robot controllers associated with individual behaviors designed for particular tasks, this paper considers the problem of how to sequence and instantiate the behaviors for the purpose of completing a more complex, overarching mission. In addition, uncertainties about the environment or even the mission specifications may require the robots to learn, in a cooperative manner, how best to sequence the behaviors. In this paper, we approach this problem by using reinforcement learning to approximate the solution to the computationally intractable sequencing problem, combined with an online gradient descent approach to selecting the individual behavior parameters, while the transitions among behaviors are triggered automatically when the behaviors have reached a desired performance level relative to a task performance cost. To illustrate the effectiveness of the proposed method, it is implemented on a team of differential-drive robots for solving two different missions, namely, convoy protection and object manipulation.

Tài liệu tham khảo

Antonelli, G. (2013). Interconnected dynamic systems: An overview on distributed control. IEEE Control Systems Magazine, 33(1), 76–88. Cortés, J., & Egerstedt, M. (2017). Coordinated control of multi-robot systems: A survey. SICE Journal of Control, Measurement, and System Integration, 10(6), 495–503. Oh, K. K., Park, M. C., & Ahn, H. S. (2015). A survey of multi-agent formation control. Automatica, 53, 424–440. Schwager, M., Rus, D., & Slotine, J. J. (2011). Unifying geometric, probabilistic, and potential field approaches to multi-robot deployment. International Journal of Robotics Research, 30(3), 371–383. Li, A., Wang, L., Pierpaoli, P., & Egerstedt, M. (2018). Formally correct composition of coordinated behaviors using control barrier certificates. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3723–3729. Madrid, Spain. Berman, S., Lindsey, Q., Sakar, M. S., Kumar, V., & Pratt, S. C. (2011). Experimental study and modeling of group retrieval in ants as an approach to collective transport in swarm robotic systems. Proceedings of the IEEE, 99(9), 1470–1481. Arkin, R. C. (1998). Behavior-based robotics. Cambridge: MIT Press. Rosenblatt, J. K. (1997). Damn: A distributed architecture for mobile navigation. Journal of Experimental & Theoretical Artificial Intelligence, 9(2–3), 339–360. Nagavalli, S., Chakraborty, N., & Sycara, K. (2017). Automated sequencing of swarm behaviors for supervisory control of robotic swarms. In IEEE International Conference on Robotics and Automation (ICRA), pp. 2674–2681. Singapore. Pierpaoli, P., Li, A., Srinivasan, M., Cai, X., Coogan, S., & Egerstedt, M. (2019). A sequential composition framework for coordinating multi-robot behaviors. arXiv preprint. arXiv:1907.07718. Bertsekas, D. P. (2019). Reinforcement Learning and Optimal Control. Athena Scientific. Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed.). Cambridge: MIT Press. Arbib, M. A. (1992). Schema theory. The Encyclopedia of Artificial Intelligence, 2, 1427–1443. Brooks, R. A. (1991). Intelligence without representation. Artificial Intelligence, 47(1–3), 139–159. Kress-Gazit, H., Lahijanian, M., & Raman, V. (2018). Synthesis for robots: Guarantees and feedback for robot behavior. Annual Review of Control, Robotics, and Autonomous Systems, 1, 211–236. Marino, A., Parker, L., Antonelli, G., & Caccavale, F. (2009). Behavioral control for multi-robot perimeter patrol: A finite state automata approach. In IEEE International Conference on Robotics and Automation, pp. 831–836. Kobe, Japan. Klavins, E., & Koditschek, D. E. (2000). A formalism for the composition of concurrent robot behaviors. In IEEE International Conference on Robotics and Automation, pp. 3395–3402. San Francisco, CA, USA. Colledanchise, M., & Ögren, P. (2014). How behavior trees modularize robustness and safety in hybrid systems. In 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1482–1488. Chicago, IL, USA. Martin, P., & Egerstedt, M. B. (2012). Hybrid systems tools for compiling controllers for cyber-physical systems. Discrete Event Dynamic Systems, 22(1), 101–119. Twu, P., Martin, P., & Egerstedt, M. (2010). Graph process specifications for hybrid networked systems. IFAC Proceedings Volumes, 43(12), 65–70. Cortes, J., Martinez, S., Karatas, T., & Bullo, F. (2004). Coverage control for mobile sensing networks. IEEE Transactions on Robotics and Automation, 20(2), 243–255. Ogren, P., Fiorelli, E., & Leonard, N. E. (2004). Cooperative control of mobile sensor networks: Adaptive gradient climbing in a distributed environment. IEEE Transactions on Automatic Control, 49(8), 1292–1302. Kar, S., Moura, J. M. F., & Poor, H. V. (2013). \(QD\)-learning: A collaborative distributed strategy for multi-agent reinforcement learning through consensus + innovations. IEEE Transactions on Signal Processing, 61, 1848–1862. Doan, T. T., Maguluri, S. T., & Romberg, J. (2019). Finite-time analysis of distributed TD(0) with linear function approximation for multi-agent reinforcement learning. In Proceedings of the 36th International Conference on Machine Learning, pp. 1626–1635. Long Beach, CA, USA. Wai, H. T., Yang, Z., Wang, Z., & Hong, M. (2018). Multi-agent reinforcement learning via double averaging primal-dual optimization. In Annual Conference on Neural Information Processing Systems. Montreal, Canada. Zhang, K., Yang, Z., & Basar, T. (2018). Networked multi-agent reinforcement learning in continuous spaces. In IEEE Conference on Decision and Control (CDC), pp. 2771–2776. Miami Beach, FL, USA. Zelazo, D., Mesbahi, M., & Belabbas, M. A. (2018). Graph theory in systems and controls. In IEEE Conference on Decision and Control (CDC), pp. 6168–6179. Miami Beach, FL, USA. Mesbahi, M., & Egerstedt, M. (2010). Graph theoretic methods in multiagent networks. vol. 33. Princeton University Press. Mehta, T. R., & Egerstedt, M. (2006). An optimal control approach to mode generation in hybrid systems. Nonlinear Analysis: Theory, Methods & Applications, 65(5), 963–983. Pickem, D., Glotfelter, P., Wang, L., Mote, M., Ames, A., Feron, E., & Egerstedt, M. (2017). The robotarium: A remotely accessible swarm robotics research testbed. In IEEE International Conference on Robotics and Automation (ICRA), pp. 1699–1706. Singapore.