Optimal Strategy for Aircraft Pursuit-evasion Games via Self-play Iteration

Xin Wang1,2, Qing-Lai Wei1,2,3, Tao Li1,2, Jie Zhang1,2
1State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
2School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
3Institute of Systems Engineering, Macau University of Science and Technology, Macau, China

Tóm tắt

In this paper, the pursuit-evasion game with state and control constraints is solved to achieve the Nash equilibrium of both the pursuer and the evader with an iterative self-play technique. Under the condition where the Hamiltonian formed by means of Pontryagin’s maximum principle has the unique solution, it can be proven that the iterative control law converges to the Nash equilibrium solution. However, the strong nonlinearity of the ordinary differential equations formulated by Pontryagin’s maximum principle makes the control policy difficult to figured out. Moreover the system dynamics employed in this manuscript contains a high dimensional state vector with constraints. In practical applications, such as the control of aircraft, the provided overload is limited. Therefore, in this paper, we consider the optimal strategy of pursuit-evasion games with constant constraint on the control, while some state vectors are restricted by the function of the input. To address the challenges, the optimal control problems are transformed into nonlinear programming problems through the direct collocation method. Finally, two numerical cases of the aircraft pursuit-evasion scenario are given to demonstrate the effectiveness of the presented method to obtain the optimal control of both the pursuer and the evader.

Tài liệu tham khảo

R. Isaacs. Differential Gaines: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization, New York, USA: Dover Publications, 1999. P. K. Chintagunta, V. R. Rao. Pricing strategies in a dynamic duopoly: A differential game model. Management Science, vol. 42, no. 11, pp. 1501–1514, 1996. DOI: https://doi.org/10.5555/2777472.2777473. L. A. Petrosyan, N. A. Zenkevich. Game Theory, Singapore: World Scientific Publishing Co Pte Ltd, 1996. Y. Mousavi, A. Zarei, A. Mousavi, M. Biari. Robust optimal higher-order-observer-based dynamic sliding mode control for VTOL unmanned aerial vehicles. International Journal of Automation and Computing, vol. 18, no. 5, pp. 802–813, 2021. DOI: https://doi.org/10.1007/s11633-021-1282-3. H. G. Zhang, Q. L. Wei, D. R. Liu. An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica, vol. 47, no. 1, pp. 207–214, 2011. DOI: https://doi.org/10.1016/j.automatica.2010.10.033. N. Greenwood. A differential game in three dimensions: The aerial dogfight scenario. Dynamics and Control, vol. 2, no. 2, pp. 161–200, 1992. DOI: https://doi.org/10.1007/BF02169496. K. Horie, B. A. Conway. Optimal fighter pursuit-evasion maneuvers found via two-sided optimization. Journal of Guidance, Control, and Dynamics, vol. 29, no. 1, pp. 105–112, 2006. DOI: https://doi.org/10.2514/1.3960. Z. Y. Li, H. Zhu, Z. Yang, Y. Z. Luo. A dimension-reduction solution of free-time differential games for spacecraft pursuit-evasion. Acta Astronautica, vol. 163, pp.201-210, 2019. DOI: https://doi.org/10.1016/j.actaastro.2019.01.011. J. F. Zhou, L. Zhao, H. Li, J. H. Cheng, S. Wang. Compensation control strategy for orbital pursuit-evasion problem with imperfect information. Applied Sciences, vol.11, no.4, Article number 1400, 2021. DOI: https://doi.org/10.3390/app11041400. M. Salimi, M. Ferrara. Differential game of optimal pursuit of one evader by many pursuers. International Journal of Game Theory, vol. 48, no. 2, pp. 481–490, 2019. DOI: https://doi.org/10.1007/s00182-018-0638-6. V. G. Lopez, F. L. Lewis, Y. Wan, E. N. Sanchez, L. L. Fan. Solutions for multiagent pursuit-evasion games on communication graphs: Finite-time capture and asymptotic behaviors. IEEE Transactions on Automatic Control, vol. 65, no. 5, pp. 1911–1923, 2020. DOI: https://doi.org/10.1109/TAC.2019.2926554. E. Garcia, D. W. Casbeer, A. von Moll, M. Pachter. Multiple pursuer multiple evader differential games. IEEE Transactions on Automatic Control, vol. 66, no. 5, pp. 2345–2350, 2021. DOI: https://doi.org/10.1109/TAC.2020.3003840. D. W. Oyler. Contributions to Pursuit-Evasion Game Theory, Ph.D. dissertation, University of Michigan, USA, 2016. D. Wang, M. M. Ha, M. M. Zhao. The intelligent critic framework for advanced optimal control. Artificial Intelh-gence Review, vol. 55, no. 1, pp. 1–22, 2022. DOI: https://doi.org/10.1007/s10462-021-10118-9. P. Soravia. Pursuit-evasion problems and viscosity solutions of isaacs equations. SIAM Journal on Control and Optimization, vol. 31, no. 3, pp. 604–623, 1993. DOI: https://doi.org/10.1137/0331027. Q. L. Wei, D. R. Liu, Q. Lin, R. Z. Song. Adaptive dynamic programming for discrete-time zero-sum games. IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 4, pp. 957–969, 2018. DOI: https://doi.org/10.1109/TNNLS.2016.2638863. L. S. Pontryagin. Mathematical Theory of Optimal Processes, Boca Raton, USA: CRC Press, 1987. R. W. Carr, R. G. Cobb, M. Pachter, S. Pierce. Solution of a pursuit-evasion game using a near-optimal strategy. Journal of Guidance, Control, and Dynamics, vol. 41, no. 4, pp. 841–850, 2018. DOI: https://doi.org/10.2514/1.G002911. M. Pontani, B. A. Conway. Numerical solution of the three-dimensional orbital pursuit-evasion game. Journal of Guidance, Control, and Dynamics, vol. 32, no. 2, pp. 474–487, 2009. DOI: https://doi.org/10.2514/1.37962. Y. L. Yang, K. G. Vamvoudakis, H. Modares. Safe reinforcement learning for dynamical games. International Journal of Robust and Nonlinear Control, vol. 30, no. 2, pp. 3706–3726, 2020. DOI: https://doi.org/10.1002/rnc.4962. M. M. Ha, D. Wang, D. R. Liu. Discounted iterative adaptive critic designs with novel stability analysis for tracking control. IEEE/CAA Journal of Automatica Sanica, vol. 9, no. 7, pp. 1262–1272, 2022. DOI: https://doi.org/10.1109/JAS.2022.105692. Y. Yang, D. Ding, H. Xiong, Y. Yin, D. Wunsch. Online barrier-actor-critic learning for H∞, control with full-state constraints and input saturation. Journal of the Franklin Institute, vol. 357, no. 7, pp. 3316–3344, 2020. DOI: https://doi.org/10.1016/j.jfranklin.2019.12.017. Y. Kartal, K. Subbarao, A. Dogan, F. Lewis. Optimal game theoretic solution of the pursuit-evasion intercept problem using on-policy reinforcement learning. International Journal of Robust and Nonhnear Control, vol. 31, no. 16, pp. 7886–7903, 2021. DOI: https://doi.org/10.1002/rnc.5719. J. Selvakumar, E. Bakolas. Feedback strategies for a reach-avoid game with a single evader and multiple pursuers. IEEE Transactions on Cybernetics, vol. 51, no. 2, pp. 696–707, 2021. DOI: https://doi.org/10.1109/TCYB.2019.2914869. H. Xu. Finite-horizon near optimal design of nonhnear two-player zero-sum game in presence of completely unknown dynamics. Journal of Control, Automation and Electrical Systems, vol. 26, no. 4, pp. 361–370, 2015. DOI: https://doi.org/10.1007/s40313-015-0180-8. C. X. Mu, K. Wang, C. Y. Sun. Policy-iteration-based learning for nonlinear player game systems with constrained inputs. IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 51, no. 10, pp. 6488–6502, 2021. DOI: https://doi.org/10.1109/TSMC.2019.2962629. X. H. Cui, H. G. Zhang, Y. H. Luo, P. F. Zu. Online finite-horizon optimal learning algorithm for nonzero-sum games with partially unknown dynamics and constrained inputs. Neurocomputing, vol. 185, pp. 37–44, 2016. DOI: https://doi.org/10.1016/j.neucom.2015.12.021. I. E. Weintraub, M. Pachter, E. Garcia. An introduction to pursuit-evasion differential games. In Proceedings of American Control Conference, IEEE, Denver, USA, pp. 1049–1066, 2020. DOI: https://doi.org/10.23919/ACC45564.2020.9147205. M. H. Breitner, H. J. Pesch, W. Grimm. Complex differential games of pursuit-evasion type with state constraints, Part 1: Necessary conditions for optimal open-loop strategies. Journal of Optimization Theory and Applications, vol. 78, no. 3, pp. 419–441, 1993. DOI: https://doi.org/10.1007/BF00939876. A. Bressan. Noncooperative differential games. Milan Journal of Mathematics, vol. 79, pp. 357–427, 2011. A. S. El-Bakry, R. A. Tapia, T. Tsuchiya, Y. Zhang. On the formulation and theory of the newton interior-point method for nonhnear programming. Journal of Optimization Theory and Applications, vol. 89, no. 3, pp. 507–541, 1996. DOI: https://doi.org/10.1007/BF02275347. P. T. Boggs, J. W. Tolle. Sequential quadratic programming. Acta Numerica, vol. 4, pp. 1–51, 1995. DOI: https://doi.org/10.1017/S0962492900002518. A. R. Conn, N. I. M. Gould, P. L. Toint. Trust-Region Methods, Philadelphia, USA: SIAM, 2000. F. Austin, G. Carbone, M. Falco, H. Hinz, M. Lewis. Automated maneuvering decisions for air-to-air combat. In Proceedings of Guidance, Navigation and Control Conference, Monterey, USA, pp. 659–669, 1987. DOI: https://doi.org/10.2514/6.1987-2393.