Speech improvement in noisy reverberant environments using virtual microphones along with proposed array geometry

EURASIP Journal on Advances in Signal Processing - Tập 2022 - Trang 1-20 - 2022
Mohammad Ebrahim Sadeghi1,2, Hamid Sheikhzadeh1, Mohammad Javad Emadi1
1Department of Electrical Engineering, Amirkabir University of Technology, Tehran, Iran
2Department of Broadcast Engineering, IRIB University, Tehran, Iran

Tóm tắt

This paper proposes a novel approach for improving the speech of a single speaker in noisy reverberant environments. The proposed approach is based on using a beamformer with a large number of virtual microphones with the suggested arrangement on an open sphere. Our method takes into account virtual microphone signal synthesizing using the non-parametric sound field reproduction in the spherical harmonics domain and the popular weighted prediction error method. We obtain entirely accurate beam steering towards a known source location with more directivity. The suggested approach is proven to perform effectively not just in boosting the directivity factor but also in terms of improving speech quality as measured by subjective metrics like the PESQ. In comparison to current research in the area of speech enhancement by beamformer, our experiments reveal more noise and reverberation suppression as well as improved quality in the enhanced speech samples due to the usage of virtual beam rotation in the fixed beamformer. Text for this section.

Tài liệu tham khảo

J. Benesty, I. Cohen, J. Chen, Fundamentals of Signal Enhancement and Array Signal Processing (John Wiley, New Jersey, 2017) R. Haeb-Umbach, J. Heymann, L. Drude, S. Watanabe, M. Delcroix, T. Nakatani, Far-field automatic speech recognition. Proc. IEEE 109(2), 124–148 (2020) M. Parchami, W.-P. Zhu, B. Champagne, Speech dereverberation using weighted prediction error with correlated inter-frame speech components. Speech Commun. 87(1), 49–57 (2017) J. Benesty, J. Chen, Y. Huang, Microphone Array Signal Processing (Springer, New Jersy, 2008) H. Katahira, N. Ono, S. Miyabe, T. Yamada, S. Makino, Nonlinear speech enhancement by virtual increase of channels and maximum SNR beamformer. EURASIP J. Adv. Signal Process. 2016(1), 1–8 (2016) L. Wang, H. Ding, F. Yin, Combining superdirective beamforming and frequency-domain blind source separation for highly reverberant signals. EURASIP J. Audio Speech Music Process. 1, 1–13 (2010) M. Arcienega, A. Drygajlo, J. Malsano, Robust phase shift estimation in noise for microphone arrays with virtual sensors. in 2000 10th European Signal Processing Conference, ed. by IEEE (2000), pp. 1–4 G. Doblinger, Optimized design of interpolated array and sparse array wideband beamformers. in 2008 16th European Signal Processing Conference, ed. by IEEE (2008), pp. 1–5 C.H.M. Olmedilla, D. Gomez, Image Theory Applied to Virtual Microphones (2008) H. Katahira, N. Ono, S. Miyabe, T. Yamada, S. Makino, Virtually increasing microphone array elements by interpolation in complex-logarithmic domain. in 21st European Signal Processing Conference (EUSIPCO 2013), IEEE, 2013), pp. 1–5 G. Del Galdo, O. Thiergart, T. Weller, E.A. Habets, Generating virtual microphone signals using geometrical information gathered by distributed arrays. in 2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays, (IEEE, 2011), pp. 185–190 M. Pezzoli, F. Borra, F. Antonacci, S. Tubaro, A. Sarti, A parametric approach to virtual miking for sources of arbitrary directivity. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 2333–2348 (2020) R. Schultz-Amling, F. Kuech, O. Thiergart, M. Kallinger, Acoustical zooming based on a parametric sound field representation. in Audio Engineering Society Convention 128. (Audio Engineering Society, 2010) O. Thiergart, G. Del Galdo, M. Taseska, E.A. Habets, Geometry-based spatial sound acquisition using distributed microphone arrays. IEEE Trans. Audio Speech Lang. Process. 21(12), 2583–2594 (2013) K. Kowalczyk, O. Thiergart, M. Taseska, G. Del Galdo, V. Pulkki, E.A. Habets, Parametric spatial sound processing: a flexible and efficient solution to sound scene acquisition, modification, and reproduction. IEEE Signal Process. Mag. 32(2), 31–42 (2015) P. Samarasinghe, T. Abhayapala, M. Poletti, Wavefield analysis over large areas using distributed higher order microphones. IEEE/ACM Trans. Audio Speech Lang. Process. 22(3), 647–658 (2014) J.G. Tylka, E. Choueiri, Soundfield navigation using an array of higher-order ambisonics microphones. in Audio Engineering Society Conference: 2016 AES International Conference on Audio for Virtual and Augmented Reality. (Audio Engineering Society, 2016) N. Ueno, S. Koyama, H. Saruwatari, Sound field recording using distributed microphones based on harmonic analysis of infinite order. IEEE Signal Process. Lett. 25(1), 135–139 (2017) Y. Takida, S. Koyama, H. Saruwataril, Exterior and interior sound field separation using convex optimization: comparison of signal models. in 2018 26th European Signal Processing Conference (EUSIPCO). (IEEE, 2018), pp. 2549–2553 F. Borra, I.D. Gebru, D. Markovic, Soundfield reconstruction in reverberant environments using higher-order microphones and impulse response measurements. in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (IEEE, 2019), pp. 281–285 F. Borra, S. Krenn, I.D. Gebru, D. Marković, 1st-order microphone array system for large area sound field recording and reconstruction: Discussion and preliminary results. In: 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 378–382 (2019). IEEE F. Zotter, M. Frank, Ambisonics: A Practical 3D Audio Theory for Recording, Studio Production, Sound Reinforcement, and Virtual Reality, vol. 19 (Springer, Berlin, 2019) M. Parchami, H. Amindavar, W.-P. Zhu, Speech reverberation suppression for time-varying environments using weighted prediction error method with time-varying autoregressive model. Speech Commun. 109, 1–14 (2019) K. Kinoshita, M. Delcroix, S. Gannot, E.A. Habets, R. Haeb-Umbach, W. Kellermann, V. Leutnant, R. Maas, T. Nakatani, B. Raj et al., A summary of the reverb challenge: state-of-the-art and remaining challenges in reverberant speech processing research. EURASIP J. Adv. Signal Process. 2016(1), 1–19 (2016) I. Kodrasi, Dereverberation and Noise Reduction Techniques Based on Acoustic Multi-channel Equalization (Verlag Dr. Hut, Munich, 2016) J. Wung, A. Jukić, S. Malik, M. Souden, R. Pichevar, J. Atkins, D. Naik, A. Acero, Robust multichannel linear prediction for online speech dereverberation using weighted householder least squares lattice adaptive filter. IEEE Trans. Signal Process. 68, 3559–3574 (2020) B. Rafaely, Fundamentals of Spherical Array Processing, vol. 8 (Springer, Berlin, 2015) J. Meyer, Beamforming for a circular microphone array mounted on spherically shaped objects. J. Acoust. Soc. Am. 109(1), 185–193 (2001) T. Nakatani, T. Yoshioka, K. Kinoshita, M. Miyoshi, B.-H. Juang, Speech dereverberation based on variance-normalized delayed linear prediction. IEEE Trans. Audio Speech Lang. Process. 18(7), 1717–1731 (2010) M. Blanco Galindo, P. Coleman, P.J. Jackson, Microphone array geometries for horizontal spatial audio object capture with beamforming. J. Audio Eng. Soc. 68(5), 324–337 (2020) G. Huang, J. Benesty, J. Chen, On the design of frequency-invariant beampatterns with uniform circular microphone arrays. IEEE/ACM Trans. Audio Speech Lang. Process. 25(5), 1140–1153 (2017) M. Acoustics, Em32 eigenmike microphone array release notes (v17. 0). 25 Summit Ave, Summit, NJ 07901, USA (2013) E.A. Habets, Room impulse response generator. Technische Universiteit Eindhoven, Tech. Rep 2(2.4), 1 (2006) R. Zhang, J. Liu, An improved multi-band spectral subtraction using DMel-scale. Proced. Comput. Sci. 131, 779–785 (2018) Y. Hu, P.C. Loizou, Evaluation of objective quality measures for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 16(1), 229–238 (2007) C.G. Flores, G. Tryfou, M. Omologo, Cepstral distance based channel selection for distant speech recognition. Comput. Speech Lang. 47, 314–332 (2018) J.F. Santos, T.H. Falk, Speech dereverberation with context-aware recurrent neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 26(7), 1236–1246 (2018) V. Zue, S. Seneff, J. Glass, Speech database development at MIT: TIMIT and beyond. Speech Commun. 9(4), 351–356 (1990)