Model-based optimal action selection for Dyna-Q reverberation suppression cognitive sonar
Tóm tắt
The Doppler shift of low-speed targets is frequently disturbed by the reverberation Doppler spread clutter under the shallow sea. The clutter is generated by underwater scatterers, which increases the difficulty of Doppler estimation. To solve this problem, a reverberation target resolution function based on the Doppler spread clutter statistical model is proposed in this paper. Through the width of reverberation Doppler clutter, this function adjusts the waveform parameters by determining whether the target is discriminable. In addition, the reverberation Doppler spread clutter is time-spatial varying and affected by grazing angle, waves, wind speed, fish and other effects. Thus, the sonar waveform parameters need to be adjusted constantly. Therefore, this paper combines the cognitive sonar based on reinforcement learning with the reverberation target resolution function to evaluate different waveforms in different environments. Consequently, the sonar can adjust the waveform parameters in real-time and obtain the optimal waveform in different environments. Meanwhile, in this paper, the action selection strategy of Dyna-Q reinforcement learning is optimized, and the model-based maximum action selection Dyna-Q algorithm (Dyna-Q-Max-Action) is proposed. Compared with the traditional Dyna-Q and Q-learning algorithms, the proposed algorithm needs fewer episodes. Finally, numerical simulation verified the effectiveness of the proposed algorithm.
Tài liệu tham khảo
Z. Hao ke, M. Qu li, L. Hai lin, Study on robust space-time adaptive reverberation suppressing, in IEEE 10th International Conference on Signal Processing Proceedings, pp. 2407–2410 (2010)
X. Cui, C. Chi, S. Li, Y. Li, H. Huang, Coprime pulse trains of frequency-modulated for suppressing reverberation, in OCEANS 2021: San Diego Porto, pp. 1–4 (2021)
B.W. Choi, E.H. Bae, J.S. Kim, K.K. Lee, Improved prewhitening method for linear frequency modulation reverberation using dechirping transformation. J. Acoust. Soc. Am. 123(3), 21–25 (2008)
J.N. Maksym, M. Sandys-Wunsch, Adaptive beamforming against reverberation for a three-sensor array. J. Acoust. Soc. Am. 102(6), 3433–3438 (1997)
Y. Li, H. Huang, C. Zhang, S. Li, New schur-type-based pci algorithms for reverberation suppression in active sonar, in Proceedings. (ICASSP ’05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005, vol. 4, pp. 641–6444 (2005)
Z.-Q. Wang, L. An, J.-R. Lu, Signal detection based on mathematical morphology in oceanic reverberation, in 2007 14th International Conference on Mechatronics and Machine Vision in Practice, pp. 8–12 (2007)
S. Haykin, Cognitive radar: a way of the future. IEEE Signal Process. Mag. 23(1), 30–40 (2006)
L. Xiaohua, L. Yaan, L. Guancheng, Y. Jing, Research of the principle of cognitive sonar and beamforming simulation analysis, in 2011 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), pp. 1–5 (2011)
T. Claussen, V.D. Nguyen, Real-time cognitive sonar system with target-optimized adaptive signal processing through multi-layer data fusion, in 2015 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), pp. 357–361 (2015)
X. Qing, D. Nie, G. Qiao, J. Tang, Dolphin bio-inspired transmitting waveform design for cognitive sonar and its performance analysis, in 2016 IEEE/OES China Ocean Acoustics (COA), pp. 1–7 (2016)
D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, D. Hassabis, A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018)
M. Taylor, Teaching reinforcement learning with mario: An argument and case study, in Proceedings of the National Conference on Artificial Intelligence 2 (2011)
J.E. Summers, J.M. Trader C.F. Gaumond, J.L. Chen, Deep reinforcement learning for cognitive sonar. J. Acoust. Soc. Am. 143(3-Supplement), 1716–1716 (2018)
J. Tucker, V. Chavali, K.E. Wage, J.K. Nelson, Multiple objective optimization for fully adaptive active sonar, in OCEANS 2022, Hampton Roads, pp. 1–9 (2022)
T.C. Yang, J. Schindall, C.-F. Huang, J.-Y. Liu, Clutter reduction using Doppler sonar in a harbor environment. J. Acoust. Soc. Am. 132(5), 3053–3067 (2012)
R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA, 2016)
X. He, Y. Xu, M. Liu, C. Hao, C. Hou, Adaptive estimation of k-distribution shape parameter based on fuzzy statistical normalization processing. IEEE Trans. Aerosp. Electron. Syst. 58(5), 4566–4577 (2022)
P.C. Etter, C.H. Haas, D.V. Ramani, Evolving trends and challenges in applied underwater acoustic modeling, in OCEANS 2015 - MTS/IEEE Washington, pp. 1–10 (2015)
F. Cao, X. Zhang, J. Han, S. Lv, Experimental analysis of statistical property of low frequency reverberation envelope in shallow water, in 2021 OES China Ocean Acoustics (COA), pp. 534–538 (2021)
J.J. Murray, A theoretical model of linearly filtered reverberation for pulsed active sonar in shallow water. J. Acoust. Soc. Am. 136(5), 2523–2531 (2014)
C. Zhang, X. Ma, X. Li, F. Zhan, S. Zhang, Modified asymmetric statistical model for the reverberation doppler spread spectrum. Shengxue Xuebao/Acta Acustica 43, 943–950 (2018)
J. Zhang, X. Qiu, C. Shi, Y. Wu, Cognitive radar ambiguity function optimization for unimodular sequence. EURASIP J. Adv. Signal Process. 2016, 1–13 (2016)
N.U.R. Junejo, M. Sattar, S. Adnan, H. Sun, A.B.M. Adam, A. Hassan, H. Esmaiel, A survey on physical layer techniques and challenges in underwater communication systems. J. Marine Sci. Eng. 11(4) (2023)
X. Li, C. Yang, J. Song, S. Feng, W. Li, H. He, A motion control method for agent based on dyna-q algorithm, in 2023 4th International Conference on Computer Engineering and Application (ICCEA), pp. 274–278 (2023)