A dynamic multi-channel speech enhancement system for distributed microphones in a car environment
Tóm tắt
Supporting multiple active speakers in automotive hands-free or speech dialog applications is an interesting issue not least due to comfort reasons. Therefore, a multi-channel system for enhancement of speech signals captured by distributed distant microphones in a car environment is presented. Each of the potential speakers in the car has a dedicated directional microphone close to his position that captures the corresponding speech signal. The aim of the resulting overall system is twofold: On the one hand, a combination of an arbitrary pre-defined subset of speakers’ signals can be performed, e.g., to create an output signal in a hands-free telephone conference call for a far-end communication partner. On the other hand, annoying cross-talk components from interfering sound sources occurring in multiple different mixed output signals are to be eliminated, motivated by the possibility of other hands-free applications being active in parallel. The system includes several signal processing stages. A dedicated signal processing block for interfering speaker cancellation attenuates the cross-talk components of undesired speech. Further signal enhancement comprises the reduction of residual cross-talk and background noise. Subsequently, a dynamic signal combination stage merges the processed single-microphone signals to obtain appropriate mixed signals at the system output that may be passed to applications such as telephony or a speech dialog system. Based on signal power ratios between the particular microphone signals, an appropriate speaker activity detection and therewith a robust control mechanism of the whole system is presented. The proposed system may be dynamically configured and has been evaluated for a car setup with four speakers sitting in the car cabin disturbed in various noise conditions.
Tài liệu tham khảo
Brandstein M, Ward D: (eds.), Microphone Arrays: Signal Processing Techniques and Applications. Berlin: Springer; 2001.
Van Veen BD, Buckley KM: Beamforming: A versatile approach to spatial filtering. IEEE ASSP Mag 1988, 5(2):4-24.
Freudenberger J, Stenzel S, Venditti B: Microphone diversity combining for in-car applications. EURASIP J. Adv. Signal Process 2010, 2010: 1-13.
Gerkmann T, Martin R: Soft decision combining for dual channel noise reduction. In Proceedings of the International Conference on Spoken Language Processing (INTERSPEECH). Pittsburgh, Pennsylvania, USA; 17–21 Sept 2006:2134-2137.
Banno H, Shinde T, Takeda K, Itakura F: In-car speech recognition using distributed microphones: adapting to automatically detected driving conditions. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Hong Kong, China; 6–10 April 2003:I-324–I-327.
Li W, Takeda K, Itakura F: Optimizing regression for in-car speech recognition using multiple distributed microphones. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). Jeju, Korea; 4–8 Oct 2004:2689-2692.
Shimizu Y, Kajita S, Takeda K, Itakura F: Speech recognition based on space diversity using distributed multi-microphone. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Istanbul, Turkey; 5–9 June 2000:III-1747–III-1750.
Hummes F, Qi J, Fingscheidt T: Robust Acoustic Speaker Localization with Distributed Microphones. In Proceedings of the European Signal Processing Conference (EUSIPCO). Barcelona, Spain; 29 Aug–2 Sept 2011:240-244.
Widrow B, Glover JR, McCool JM, Kaunitz J, Williams CS, Hearn RH, Zeidler JR, Dong E, Goodlin RC: Adaptive noise cancelling: principles and applications. Proc. IEEE 1975, 63(12):1692-1716.
Hirano A, Nakayama K, Arai S, Deguchi M: A low-distortion noise canceller and its learning algorithm in presence of crosstalk. IEICE Trans. Fundamentals Electron. Commun. Comput. Sci 2001, E84-A(2):414-421.
Lombard A, Kellermann W: Multichannel cross-talk cancellation in a call-center scenario using frequency-domain adaptive filtering. In Proceedings of the International Workshop on Acoustic Echo and Noise Control (IWAENC). Seattle, Washington, USA; 14–17 Sept 2008.
Robledo-Arnuncio E, Juang BH: Blind source separation of acoustic mixtures with distributed microphones. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Honolulu, Hawai, USA; 15–20 April 2007:III-949–III-952.
Dmochowski JP, Liu Z, Chou PA: Blind source separation in a distributed microphone meeting environment for improved teleconferencing. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Las Vegas, Nevada, USA; 30 March–4 April 2008:88-92.
Aichner R, Zourub M, Buchner H, Kellermann W: Residual cross-talk and noise suppression for convolutive blind source separation. In Proceedings of the Deutsche Jahrestagung für Akustik (DAGA). Braunschweig, Germany; 20–23 March 2006:41-42.
Han S, Cui J, Li P: Post-processing for frequency-domain blind source separation in hearing aids. In Proceedings of the International Conference on Information, Communications and Signal Processing (ICICS). Macau, China; 8–10 Dec 2009:356-360.
Jeub M, Herglotz C, Nelke CM, Beaugeant C, Vary P: Noise reduction for dual-microphone mobile phones exploiting power level differences. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Kyoto, Japan; 25–30 March 2012:1693-1696.
Sondhi MM, Morgan DR, Hall JL: Stereophonic acoustic echo cancellation–an overview of the fundamental problem. IEEE Signal Process. Lett 1995, 2(8):148-151.
Buchner H: Acoustic echo cancellation for multiple reproduction channels: from first principles to real-time solutions. In Proceedings of the ITG-Fachtagung Sprachkommunikation. Aachen, Germany; 8–10 Oct 2008:1-4.
Bourgeois J, Minker W: Time-Domain Beamforming and, Blind Source Separation. Heidelberg: Springer; 2009.
Sugiyama A: Low-distortion noise cancellers—Revival of a classical technique. In Speech and Audio Processing in Adverse Environments. Edited by: Hänsler E, Schmidt G. Berlin: Springer; 2008:229-264.
Haykin S: Adaptive Filter Theory. Upper Saddle River: Prentice Hall; 2002.
Matheja T, Buck M, Wolff T: Robust adaptive cancellation of interfering speakers for distributed microphone systems in cars. In Proceedings of the Deutsche Jahrestagung für Akustik (DAGA). Berlin, Germany; 15–18 March 2010:255-256.
Hänsler E, Schmidt G: Acoustic Echo and Noise Control: A Practical Approach. Hoboken: Wiley; 2004.
Matheja T, Buck M, Eichentopf A: Dynamic signal combining for distributed microphone systems in car environments. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Prague, Czech Republic; 22–27 May 2011:5092-5095.
Linhard K, Haulick T: Noise subtraction with parametric recursive gain curves. In Proceedings of the European Conference on Speech Communication and Technology (EUROSPEECH). Budapest, Hungary; 5–9 Sept 1999:2611-2614.
Cohen I: Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Trans. Speech Audio Process 2003, 11(5):466-475. 10.1109/TSA.2003.811544
Matheja T, Buck M, Fingscheidt T: A multi-channel quality assessment setup applied to a distributed microphone speech enhancement system with spectral boosting. In Proceedings of the ITG-Fachtagung Sprachkommunikation. Braunschweig, Germany; 26–28 Sept 2012:119-122.
Matheja T, Buck M, Fingscheidt T: Speaker activity detection for distributed microphone systems in cars. In Proceedings of the 6th Biennial Workshop on Digital Signal Processing for In-Vehicle Systems. Seoul, Korea; 29 Sept–2 Oct 2013.
Matheja T, Buck M, Wolff T: Enhanced speaker activity detection for distributed microphones by exploitation of signal power ratio patterns,. in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), (Kyoto, Japan, 25–30 March 2012), pp. 2501–2504
Martin R: An efficient algorithm to estimate the instantaneous SNR of speech signals. In Proceedings of the European Conference on Speech Communication and Technology (EUROSPEECH). Berlin, Germany; 22–25 Sept 1993:1093-1096.
Matheja T, Buck M: Robust voice activity detection for distributed microphones by modeling of power ratios. In Proceedings of the ITG-Fachtagung Sprachkommunikation. Bochum, Germany; 6–8 Oct 2010.
International Telecommunication Union: ITU-T Recommendation P56, Objective Measurement of Active Speech Level. Geneva: International Telecommunication Union; 1993.
Fingscheidt T, Suhadi S: Quality assessment of speech enhancement systems by separation of enhanced speech, noise, and echo. In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH). Antwerp, Belgium; 27–31 Aug 2007:818-821.
Loizou PC: Speech Enhancement: Theory and Practice. Boca Raton: CRC; 2013.
Carter GC: Coherence and time delay estimation. Proc. IEEE 1987, 75(2):236-255.
Hänsler E: Statistische Signale - Grundlagen und Anwendungen. Berlin: Springer; 2001.