Single-channel speech enhancement using spectral subtraction in the short-time modulation domain

Speech Communication - Tập 52 Số 5 - Trang 450-475 - 2010
Kuldip K. Paliwal1, Kamil Wójcicki1, Belinda Schwerin1
1Signal Processing Laboratory, Griffith School of Engineering, Griffith University, Nathan, QLD 4111, Australia

Tóm tắt

Từ khóa

Tài liệu tham khảo

Allen, 1977, Short term spectral analysis, synthesis, and modification by discrete Fourier transform, IEEE Trans. Acoust. Speech Signal Process., 25, 235, 10.1109/TASSP.1977.1162950

Allen, 1977, A unified approach to short-time Fourier analysis and synthesis, Proc. IEEE, 65, 1558, 10.1109/PROC.1977.10770

Arai, T., Pavel, M., Hermansky, H., Avendano, C., 1996. Intelligibility of speech with filtered time trajectories of spectral envelopes. In: Proc. Internat. Conf. Spoken Language Process. (ICSLP). Philadelphia, PA, USA, pp. 2490–2493.

Atlas, L., 2003. Modulation spectral transforms: Application to speech separation and modification. Tech. Rep. 155, IEICE, Univ. Washington, Washington, WA, USA.

Atlas, L., Li, Q., Thompson, J., 2004. Homomorphic modulation spectra. In: Proc. IEEE Internat. Conf. Acoustics, Speech, and Signal Process. (ICASSP), vol. 2. Montreal, Quebec, Canada, pp. 761–764.

Atlas, 2003, Joint acoustic and modulation frequency, EURASIP J. Appl. Signal Process., 2003, 668, 10.1155/S1110865703305013

Atlas, L., Vinton, M., 2001. Modulation frequency and efficient audio coding. In: Proc. SPIE the International Society for Optical Engineering, vol. 4474. pp. 1–8.

Bacon, 1989, Modulation masking: Effects of modulation frequency, depth and phase, J. Acoust. Soc. Amer., 85, 2575, 10.1121/1.397751

Berouti, M., Schwartz, R., Makhoul, J., 1979. Enhancement of speech corrupted by acoustic noise. In: Proc. IEEE Internat. Conf. Acoustics, Speech, and Signal Process. (ICASSP), vol. 4. Washington, DC, USA, pp. 208–211.

Boll, 1979, Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoust. Speech Signal Process., ASSP-27, 113, 10.1109/TASSP.1979.1163209

Cappe, 1994, Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor, IEEE Trans. Speech Audio Process., 2, 345, 10.1109/89.279283

Crochiere, 1980, A weighted overlap-add method of short-time Fourier analysis/synthesis, IEEE Trans. Acoust. Speech Signal Process., ASSP-28, 99, 10.1109/TASSP.1980.1163353

Depireux, 2001, Spectrotemporal response field characterization with dynamic ripples in ferrect primary auditory cortex, J. Neurophysiol., 85, 1220, 10.1152/jn.2001.85.3.1220

Drullman, 1994, Effect of reducing slow temporal modulations on speech reception, J. Acoust. Soc. Amer., 95, 2670, 10.1121/1.409836

Drullman, 1994, Effect of temporal envelope smearing on speech reception, J. Acoust. Soc. Amer., 95, 1053, 10.1121/1.408467

Ephraim, 1984, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process., ASSP-32, 1109, 10.1109/TASSP.1984.1164453

Ephraim, 1985, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process., ASSP-33, 443, 10.1109/TASSP.1985.1164550

Falk, T., Stadler, S., Kleijn, W.B., Chan, W.-Y., 2007. Noise suppression based on extending a speech-dominated modulation band. In: Proc. ISCA Conf. Internat. Speech Commun. Assoc. (INTERSPEECH). Antwerp, Belgium, pp. 970–973.

Goldsworthy, 2004, Analysis of speech-based speech transmission index methods with implications for nonlinear operations, J. Acoust. Soc. Amer., 116, 3679, 10.1121/1.1804628

Greenberg, S., Arai, T., 2001. The relation between speech intelligibility and the complex modulation spectrum. In: Proc. ISCA European Conf. Speech Commun. and Technology (EUROSPEECH). Aalborg, Denmark, pp. 473–476.

Griffin, 1984, Signal estimation from modified short-time Fourier transform, IEEE Trans. Acoust. Speech Signal Process., ASSP-32, 236, 10.1109/TASSP.1984.1164317

Hasan, 2004, A modified a priori SNR for speech enhancement using spectral subtraction rules, IEEE Signal Process. Lett., 11, 450, 10.1109/LSP.2004.824017

Hermansky, 1994, RASTA processing of speech, IEEE Trans. Speech Audio Process., 2, 578, 10.1109/89.326616

Hermansky, H., Wan, E., Avendano, C., 1995. Speech enhancement based on temporal processing. In: Proc. IEEE Internat. Conf. Acoustics, Speech, and Signal Process. (ICASSP), vol. 1. Detroit, MI, USA, pp. 405–408.

Houtgast, 1985, A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria, J. Acoust. Soc. Amer., 77, 1069, 10.1121/1.392224

Hu, 2004, Speech enhancement based on wavelet thresholding the multitaper spectrum, IEEE Trans. Speech Audio Process., 12, 59, 10.1109/TSA.2003.819949

Hu, 2007, Subjective comparison and evaluation of speech enhancement algorithms, Speech Comm., 49, 588, 10.1016/j.specom.2006.12.006

Kamath, S., Loizou, P., 2002.A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In: Proc. IEEE Internat. Conf. Acoustics, Speech, and Signal Process. (ICASSP). Orlando, FL, USA.

Kanedera, 1999, On the relative importance of various components of the modulation spectrum for automatic speech recognition, Speech Comm., 28, 43, 10.1016/S0167-6393(99)00002-3

Kim, 2004, A cue for objective speech quality estimation in temporal envelope representations, IEEE Signal Process. Lett., 11, 849, 10.1109/LSP.2004.835466

Kim, 2005, ANIQUE: An auditory model for single-ended speech quality estimation, IEEE Trans. Speech Audio Process., 13, 821, 10.1109/TSA.2005.851924

Kingsbury, 1998, Robust speech recognition using the modulation spectrogram, Speech Comm., 25, 117, 10.1016/S0167-6393(98)00032-6

Kinnunen, T., 2006. Joint acoustic-modulation frequency for speaker recognition. In: Proc. IEEE Internat. Conf. Acoustics, Speech, and Signal Process. (ICASSP), vol. 1. Toulouse, France, pp. 665–668.

Kinnunen, T., Lee, K., Li, H., 2008. Dimension reduction of the modulation spectrogram for speaker verification. In: Proc. ISCA Speaker and Language Recognition Workshop (ODYSSEY). Stellenbosch, South Africa.

Kowalski, 1996, Analysis of dynamic spectra in ferret primary auditory cortex: I. Characteristics of single unit responses to moving ripple spectra, J. Neurophysiol., 76, 3503, 10.1152/jn.1996.76.5.3503

Lim, 1979, Enhancement and bandwidth compression of noisy speech, Proc. IEEE, 67, 1586, 10.1109/PROC.1979.11540

Loizou, 2007

Lu, 2007, Reduction of musical residual noise for speech enhancement using masking properties and optimal smoothing, Pattern Recognition Lett., 28, 1300, 10.1016/j.patrec.2007.03.001

Lu, 2010, Temporal contrast normalization and edge-preserved smoothing of temporal modulation structures of speech for robust speech recognition, Speech Comm., 52, 1, 10.1016/j.specom.2009.08.006

Lyons, J., Paliwal, K., 2008. Effect of compressing the dynamic range of the power spectrum in modulation filtering based speech enhancement. In: Proc. ISCA Conf. Internat. Speech Commun. Assoc. (INTERSPEECH). Brisbane, Australia, pp. 387–390.

Malayath, 2000, Data-driven temporal filters and alternatives to GMM in speaker verification, Digital Signal Process., 10, 55, 10.1006/dspr.1999.0363

Mesgarani, N., Shamma, S., 2005. Speech enhancement based on filtering the spectrotemporal modulations. In: Proc. IEEE Internat. Conf. Acoustics, Speech, and Signal Process. (ICASSP), vol. 1. Philadelphia, PA, USA, pp. 1105–1108.

Nadeu, 1997, Filtering the time sequences of spectral parameters for speech recognition, Speech Commun., 22, 315, 10.1016/S0167-6393(97)00030-7

Paliwal, 2008, Effect of analysis window duration on speech intelligibility, IEEE Signal Process. Lett., 15, 785, 10.1109/LSP.2008.2005755

Payton, 1999, A method to determine the speech transmission index from speech waveforms, J. Acoust. Soc. Amer., 106, 3637, 10.1121/1.428216

Payton, 2002, Computing the STI using speech as a probe stimulus, 125

Portnoff, 1981, Short-time Fourier analysis of sampled speech, IEEE Trans. Acoust. Speech Signal Process., ASSP-29, 364, 10.1109/TASSP.1981.1163580

Quatieri, 2002

Rix, A., Beerends, J., Hollier, M., Hekstra, A., 2001. Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. ITU-T Recommendation, p. 862.

Schreiner, 1986, Representation of amplitude modulation in the auditory cortex of the cat. I. The anterior auditory field (AAF), Hearing Res., 21, 227, 10.1016/0378-5955(86)90221-2

Shamma, 1996, Auditory cortical representation of complex acoustic spectra as inferred from the ripple analysis method, Network: Comput. Neural Syst., 7, 439, 10.1088/0954-898X/7/3/001

Shannon, B., Paliwal, K., 2006. Role of phase estimation in speech enhancement. In: Proc. Internat. Conf. Spoken Language Process. (ICSLP). Pittsburgh, PA, USA, pp. 1423–1426.

Sheft, 1990, Temporal integration in amplitude modulation detection, J. Acoust. Soc. Amer., 88, 796, 10.1121/1.399729

Steeneken, 1980, A physical method for measuring speech-transmission quality, J. Acoust. Soc. Amer., 67, 318, 10.1121/1.384464

Thompson, J., Atlas, L., 2003. A non-uniform modulation transform for audio coding with increased time resolution. In: Proc. IEEE Internat. Conf. Acoustics, Speech, and Signal Process. (ICASSP), vol. 5. Hong Kong, pp. 397–400.

Tyagi, V., McCowan, I., Bourland, H., Misra, H., 2003. On factorizing spectral dynamics for robust speech recognition. In: Proc. ISCA European Conf. Speech Comm. and Technology (EUROSPEECH). Geneva, Switzerland, pp. 981–984.

Vaseghi, 1992, Restoration of old gramophone recordings, J. Audio Eng., 40, 791

Virag, 1999, Single channel speech enhancement based on masking properties of the human auditory system, IEEE Trans. Speech Audio Process., 7, 126, 10.1109/89.748118

Vuuren, S.V., Hermansky, H., 1998. On the importance of components of the modulation spectrum for speaker verification. In: Proc. Internat. Conf. Spoken Language Process. (ICSLP). Sydney, Australia, pp. 3205–3208.

Wang, 1982, The unimportance of phase in speech enhancement, IEEE Trans. Acoust. Speech Signal Process., 30, 679, 10.1109/TASSP.1982.1163920

Xiao, X., Chng, E., Li, H., 2007. Normalization of the speech modulation spectra for robust speech recognition. In: Proc. IEEE Internat. Conf. Acoustics, Speech, and Signal Process. (ICASSP), vol. 4. Monolulu, Hawaii, USA, pp. 1021–1024.

Zadeh, 1950, Frequency analysis of variable networks, Proc. IRE, 38, 291, 10.1109/JRPROC.1950.231083