Time Delay Estimation from Mixed Multispeaker Speech Signals Using Single Frequency Filtering

Circuits, Systems, and Signal Processing - Tập 39 - Trang 1988-2005 - 2019

B. H. V. S. Narayana Murthy¹, B. Yegnanarayana², Sudarsana Reddy Kadiri³

¹Research Centre Imarat, Hyderabad, India

²Speech Processing Laboratory, International Institute of Information Technology, Hyderabad, India

³Department of Signal Processing and Acoustics, Aalto University, Espoo, Finland

Tóm tắt

A method is proposed for time delay estimation (TDE) from mixed source (speaker) signals collected at two spatially separated microphones. The key idea in this proposal is that the crosscorrelation between corresponding segments of the mixed source signals is computed using the outputs of single frequency filtering (SFF) obtained at several frequencies, rather than using the collected waveforms directly. The advantage of the SFF output is that it will have high signal-to-noise ratio regions in both time and frequency domains. Also it gives multiple evidences, one from each of the SFF outputs. These multiple evidences are combined to obtain robustness in the TDE. The estimated time delays can be used to determine the number of speakers present in the mixed signals. The TDE is shown to be robust against different types and levels of degradations. The results are shown for actual mixed signals collected at two spatially separated microphones in a live laboratory environment, where the mixed signals contain speech from several spatially distributed speakers.

Tài liệu tham khảo

G. Aneeja, B. Yegnanarayana, Single frequency filtering approach for discriminating speech and nonspeech. IEEE/ACM Trans. Audio Speech Lang. Process. 23(4), 705–717 (2015) G. Aneeja, S.R. Kadiri, B. Yegnanarayana, Detection of glottal closure instants in degraded speech using single frequency filtering analysis, in Proceedings of INTERSPEECH (2018), pp. 2300–2304 S. Bulek, N. Erdol, Effects of cross-spectrum estimation in convolutive blind source separation: a comparative study, in Digital Signal Processing Workshop and IEEE Signal Processing Education Workshop (DSP/SPE) (2011), pp. 122–127 G.C. Carter, Coherence and time delay estimation. Proc. IEEE 75(2), 236–255 (1987) B. Champagne, S. Bedard, A. Stephenne, Performance of time-delay estimation in the presence of room reverberation. IEEE Trans. Speech Audio Process. 4(2), 148–152 (1996) J. Chen, J. Benesty, Y. Huang, Time delay estimation in room acoustic environments: an overview. EURASIP J. Appl. Signal Process. 2006, 026503 (2006) J. Chen, J. Benesty, Y. Huang, Time-delay estimation via linear interpolation and cross correlation. IEEE Trans. Speech Audio Process. 12(5), 509–519 (2004) W. He, P. Motlicek, J. Odobez, Deep neural networks for multiple speaker detection and localization, in Proceedings of IEEE International Conference on Robotics and Automation (ICRA) (2018), pp. 74–79 S.R. Kadiri, B. Yegnanarayana, Epoch extraction from emotional speech using single frequency filtering approach. Speech Commun. 86, 52–63 (2017) C. Knapp, G. Carter, The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech Signal Process. 24(4), 320–327 (1976) B. Yegnanarayana, S.R.M. Prasanna, R. Duraiswami, D. Zotkin, Processing of reverberant speech for time-delay estimation. in IEEE Transactions on Speech and Audio Processing, vol. 13 (2005), pp. 1110–1118 A. Varga, H.J. Steeneken, Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993) J. Vermaak, A. Blake, Nonlinear filtering for speaker tracking in noisy and reverberant environments, in International Conference on Acoustics, Speech, and Signal Processing (2001), pp. 3021–3024 Z.Q. Wang, X. Zhang, D. Wang, Robust speaker localization guided by deep learning-based time-frequency masking. IEEE Trans. Audio Speech Lang. Process. 27(1), 178–188 (2019) Z.Q. Wang, X. Zhang, D. Wang, Robust TDOA estimation based on time-frequency masking and deep neural networks, in Proceedings of INTERSPEECH (2018), pp. 322–326

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA