Speech enhancement using a minimum mean-square error short-time spectral modulation magnitude estimator

Speech Communication - Tập 54 - Trang 282-305 - 2012
Kuldip Paliwal1, Belinda Schwerin1, Kamil Wójcicki1
1Signal Processing Laboratory, Griffith School of Engineering, Griffith University, Nathan, QLD 4111, Australia

Tài liệu tham khảo

Atlas, 2003, Joint acoustic and modulation frequency, EURASIP J. Appl. Signal Process., 2003, 668, 10.1155/S1110865703305013 Atlas, L., Li, Q., Thompson, J., 2004. Homomorphic modulation spectra. In: Proc. IEEE Internat. Conf. on Acoustics, Speech, and Signal Process. (ICASSP), Vol. 2, Montreal, Quebec, Canada, pp. 761–764. Berouti, M., Schwartz, R., Makhoul, J., 1979. Enhancement of speech corrupted by acoustic noise. In: Proc. IEEE Internat. Conf. on Acoustics, Speech, and Signal Process (ICASSP), Vol. 4. Washington, DC, USA, pp. 208–211. Boll, 1979, Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoust. Speech Signal Process., ASSP-27, 113, 10.1109/TASSP.1979.1163209 Breithaupt, 2011, Analysis of the decision-directed snr estimator for speech enhancement with respect to low-snr and transient conditions, IEEE Trans. Audio Speech Lang. Process., 19, 277, 10.1109/TASL.2010.2047681 Cappe, 1994, Elimination of the musical noise phenomenon with the ephraim and malah noise suppressor, IEEE Trans. Speech Audio Process., 2, 345, 10.1109/89.279283 Cohen, 2005, Relaxed statistical model for speech enhancement and a priori SNR estimation, IEEE Trans. Speech Audio Process., 13, 870, 10.1109/TSA.2005.851940 Cohen, 2002, Noise estimation by minima controlled recursive averaging for robust speech enhancement, IEEE Signal Process. Lett., 9, 12, 10.1109/97.988717 Drullman, 1994, Effect of reducing slow temporal modulations on speech reception, J. Acoust. Soc. Amer., 95, 2670, 10.1121/1.409836 Ephraim, 1984, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process., ASSP-32, 1109, 10.1109/TASSP.1984.1164453 Ephraim, 1985, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process., ASSP-33, 443, 10.1109/TASSP.1985.1164550 Falk, T.H., Chan, W.-Y., 2008. A non-intrusive quality measure of dereverberated speech. In: Proc. Internat. Workshop Acoust. Echo Noise Control. Falk, 2010, Modulation spectral features for robust far-field speaker identification, IEEE Trans. Audio Speech Lang. Process., 18, 90, 10.1109/TASL.2009.2023679 Falk, T., Stadler, S., Kleijn, W.B., Chan, W.-Y., 2007. Noise suppression based on extending a speech-dominated modulation band. In: Proc. ISCA Conf. Internat. Speech Commun. Assoc. (INTERSPEECH) Antwerp, Belgium, pp. 970–973. Falk, 2010, A non-intrusive quality and intelligibility measure of reverberant and dereverberated speech, IEEE Trans. Audio Speech Lang. Process., 18, 1766, 10.1109/TASL.2010.2052247 Gray, 1980, Distortion measures for speech processing, IEEE Trans. Acoust. Speech Signal Process., ASSP-28, 367, 10.1109/TASSP.1980.1163421 Greenberg, S., Kingsbury, B., 1997. The modulation spectrogram: In persuit of an invariant representation of speech. In: Proc. IEEE Internat. Conf. on Acoustics, Speech, and Signal Process. (ICASSP), Vol. 3. Munich, Germany, pp. 1647–1650. Hermansky, H., Wan, E., Avendano, C., 1995. Speech enhancement based on temporal processing. In: Proc. IEEE Internat. Conf. on Acoustics, Speech, and Signal Process (ICASSP), Vol. 1. Detroit, MI, USA, pp. 405–408. Hu, 2007, Subjective comparison and evaluation of speech enhancement algorithms, Speech Comm., 49, 588, 10.1016/j.specom.2006.12.006 Huang, 2001 ITU-T P.835, 2007. Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm: Additional provisions for non-stationary noise suppressors. ITU-T P.835 Recommendation, Amendment 1. Kim, 2004, A cue for objective speech quality estimation in temporal envelope representations, IEEE Signal Process. Lett., 11, 849, 10.1109/LSP.2004.835466 Kim, 2005, Anique: An auditory model for single-ended speech quality estimation, IEEE Trans. Speech Audio Process., 13, 821, 10.1109/TSA.2005.851924 Kingsbury, 1998, Robust speech recognition using the modulation spectrogram, Speech Comm., 25, 117, 10.1016/S0167-6393(98)00032-6 Lim, 1979, Enhancement and bandwith compression of noisy speech, Proc. IEEE, 67, 1586, 10.1109/PROC.1979.11540 Loizou, 2005, Speech enhancement based on perceptually motivated Bayesian estimators of the magnitude spectrum, IEEE Trans. Speech Audio Process., 13, 857, 10.1109/TSA.2005.851929 Loizou, 2007 Lyons, J., Paliwal, K., 2008. Effect of compressing the dynamic range of the power spectrum in modulation filtering based speech enhancement. In: Proc. ISCA Conf. Internat. Speech Commun. Assoc. (INTERSPEECH), Brisbane, Australia, pp. 387–390. Martin, R., 1994. Spectral subtraction based on minimum statistics. In: Proc. EURASIP European Signal Process. Conf. (EUSIPCO), Edinburgh, Scotland, pp. 1182–1185. Martin, 2001, Noise power spectral density estimation based on optimal smoothing and minimum statistics, IEEE Trans. Speech Audio Process., 9, 504, 10.1109/89.928915 McAulay, 1980, Speech enhancement using a soft-decision noise suppression filter, IEEE Trans. Acoust. Speech Signal Process., 28, 137, 10.1109/TASSP.1980.1163394 Paliwal, 2008, Effect of analysis window duration on speech intelligibility, IEEE Signal Process. Lett., 15, 785, 10.1109/LSP.2008.2005755 Paliwal, 2010, Single-channel speech enhancement using spectral subtraction in the short-time modulation domain, Speech Comm., 52, 450, 10.1016/j.specom.2010.02.004 Paliwal, 2011, Role of modulation magnitude and phase spectrum towards speech intelligibility, Speech Comm., 53, 327, 10.1016/j.specom.2010.10.004 Picone, 1993, Signal modeling techniques in speech recognition, Proc. IEEE, 81, 1215, 10.1109/5.237532 Quackenbush, 1988 Quatieri, 2002 Rabiner, 2010 Rix, A., Beerends, J., Hollier, M., Hekstra, A., 2001. Perceptual Evaluation of Speech Quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codes. ITU-T Recommendation P.862. Scalart, P., Filho, J., 1996. Speech enhancement based on a priori signal to noise estimation. In: Proc. IEEE Internat. Conf. on Acoustics, Speech, and Signal Process. (ICASSP), Vol. 2. Atlanta, Georgia, USA, pp. 629–632. Shannon, B., Paliwal, K., 2006. Role of phase estimation in speech enhancement. In: Proc. Internat. Conf. on Spoken Language Process (ICSLP), Pittsburgh, PA, USA, pp. 1423–1426. Sim, 1998, A parametric formulation of the generalized spectral subtraction method, IEEE Trans. Speech Audio Process., 6, 328, 10.1109/89.701361 So, 2011, Modulation-domain kalman filtering for single-channel speech enhancement, Speech Comm., 53, 818, 10.1016/j.specom.2011.02.001 Sohn, 1999, A statistical model-based voice activity detection, IEEE Signal Process. Lett., 6, 1, 10.1109/97.736233 Thompson, J., Atlas, L., 2003. A non-uniform modulation transform for audio coding with increased time resolution. In: Proc. IEEE Internat. Conf. on Acoustics, Speech, and Signal Process (ICASSP), Vol. 5. Hong Kong, pp. 397–400. Tyagi, V., McCowan, I., Bourland, H., Misra, H., 2003. On factorizing spectral dynamics for robust speech recognition. In: Proc. ISCA European Conf. on Speech Commun. and Technology (EUROSPEECH), Geneva, Switzerland, pp. 981–984. Vary, 2006 Virag, 1999, Single channel speech enhancement based on masking properties of the human auditory system, IEEE Trans. Speech Audio Process., 7, 126, 10.1109/89.748118 Wang, 1982, The unimportance of phase in speech enhancement, IEEE Trans. Acoust. Speech Signal Process., ASSP-30, 679, 10.1109/TASSP.1982.1163920 Wiener, 1949 Wu, S., Falk, T., Chan, W.-Y., 2009. Automatic recognition of speech emotion using long-term spectro-temporal features. In: Internat. Conf. on Digital Signal Process. Zadeh, 1950, Frequency analysis of variable networks, Proc. IRE, 38, 291, 10.1109/JRPROC.1950.231083