Single-channel speech enhancement using spectral subtraction in the short-time modulation domain
Tóm tắt
Từ khóa
Tài liệu tham khảo
Allen, 1977, Short term spectral analysis, synthesis, and modification by discrete Fourier transform, IEEE Trans. Acoust. Speech Signal Process., 25, 235, 10.1109/TASSP.1977.1162950
Allen, 1977, A unified approach to short-time Fourier analysis and synthesis, Proc. IEEE, 65, 1558, 10.1109/PROC.1977.10770
Arai, T., Pavel, M., Hermansky, H., Avendano, C., 1996. Intelligibility of speech with filtered time trajectories of spectral envelopes. In: Proc. Internat. Conf. Spoken Language Process. (ICSLP). Philadelphia, PA, USA, pp. 2490–2493.
Atlas, L., 2003. Modulation spectral transforms: Application to speech separation and modification. Tech. Rep. 155, IEICE, Univ. Washington, Washington, WA, USA.
Atlas, L., Li, Q., Thompson, J., 2004. Homomorphic modulation spectra. In: Proc. IEEE Internat. Conf. Acoustics, Speech, and Signal Process. (ICASSP), vol. 2. Montreal, Quebec, Canada, pp. 761–764.
Atlas, 2003, Joint acoustic and modulation frequency, EURASIP J. Appl. Signal Process., 2003, 668, 10.1155/S1110865703305013
Atlas, L., Vinton, M., 2001. Modulation frequency and efficient audio coding. In: Proc. SPIE the International Society for Optical Engineering, vol. 4474. pp. 1–8.
Bacon, 1989, Modulation masking: Effects of modulation frequency, depth and phase, J. Acoust. Soc. Amer., 85, 2575, 10.1121/1.397751
Berouti, M., Schwartz, R., Makhoul, J., 1979. Enhancement of speech corrupted by acoustic noise. In: Proc. IEEE Internat. Conf. Acoustics, Speech, and Signal Process. (ICASSP), vol. 4. Washington, DC, USA, pp. 208–211.
Boll, 1979, Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoust. Speech Signal Process., ASSP-27, 113, 10.1109/TASSP.1979.1163209
Cappe, 1994, Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor, IEEE Trans. Speech Audio Process., 2, 345, 10.1109/89.279283
Crochiere, 1980, A weighted overlap-add method of short-time Fourier analysis/synthesis, IEEE Trans. Acoust. Speech Signal Process., ASSP-28, 99, 10.1109/TASSP.1980.1163353
Depireux, 2001, Spectrotemporal response field characterization with dynamic ripples in ferrect primary auditory cortex, J. Neurophysiol., 85, 1220, 10.1152/jn.2001.85.3.1220
Drullman, 1994, Effect of reducing slow temporal modulations on speech reception, J. Acoust. Soc. Amer., 95, 2670, 10.1121/1.409836
Drullman, 1994, Effect of temporal envelope smearing on speech reception, J. Acoust. Soc. Amer., 95, 1053, 10.1121/1.408467
Ephraim, 1984, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process., ASSP-32, 1109, 10.1109/TASSP.1984.1164453
Ephraim, 1985, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process., ASSP-33, 443, 10.1109/TASSP.1985.1164550
Falk, T., Stadler, S., Kleijn, W.B., Chan, W.-Y., 2007. Noise suppression based on extending a speech-dominated modulation band. In: Proc. ISCA Conf. Internat. Speech Commun. Assoc. (INTERSPEECH). Antwerp, Belgium, pp. 970–973.
Goldsworthy, 2004, Analysis of speech-based speech transmission index methods with implications for nonlinear operations, J. Acoust. Soc. Amer., 116, 3679, 10.1121/1.1804628
Greenberg, S., Arai, T., 2001. The relation between speech intelligibility and the complex modulation spectrum. In: Proc. ISCA European Conf. Speech Commun. and Technology (EUROSPEECH). Aalborg, Denmark, pp. 473–476.
Griffin, 1984, Signal estimation from modified short-time Fourier transform, IEEE Trans. Acoust. Speech Signal Process., ASSP-32, 236, 10.1109/TASSP.1984.1164317
Hasan, 2004, A modified a priori SNR for speech enhancement using spectral subtraction rules, IEEE Signal Process. Lett., 11, 450, 10.1109/LSP.2004.824017
Hermansky, 1994, RASTA processing of speech, IEEE Trans. Speech Audio Process., 2, 578, 10.1109/89.326616
Hermansky, H., Wan, E., Avendano, C., 1995. Speech enhancement based on temporal processing. In: Proc. IEEE Internat. Conf. Acoustics, Speech, and Signal Process. (ICASSP), vol. 1. Detroit, MI, USA, pp. 405–408.
Houtgast, 1985, A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria, J. Acoust. Soc. Amer., 77, 1069, 10.1121/1.392224
Hu, 2004, Speech enhancement based on wavelet thresholding the multitaper spectrum, IEEE Trans. Speech Audio Process., 12, 59, 10.1109/TSA.2003.819949
Hu, 2007, Subjective comparison and evaluation of speech enhancement algorithms, Speech Comm., 49, 588, 10.1016/j.specom.2006.12.006
Kamath, S., Loizou, P., 2002.A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In: Proc. IEEE Internat. Conf. Acoustics, Speech, and Signal Process. (ICASSP). Orlando, FL, USA.
Kanedera, 1999, On the relative importance of various components of the modulation spectrum for automatic speech recognition, Speech Comm., 28, 43, 10.1016/S0167-6393(99)00002-3
Kim, 2004, A cue for objective speech quality estimation in temporal envelope representations, IEEE Signal Process. Lett., 11, 849, 10.1109/LSP.2004.835466
Kim, 2005, ANIQUE: An auditory model for single-ended speech quality estimation, IEEE Trans. Speech Audio Process., 13, 821, 10.1109/TSA.2005.851924
Kingsbury, 1998, Robust speech recognition using the modulation spectrogram, Speech Comm., 25, 117, 10.1016/S0167-6393(98)00032-6
Kinnunen, T., 2006. Joint acoustic-modulation frequency for speaker recognition. In: Proc. IEEE Internat. Conf. Acoustics, Speech, and Signal Process. (ICASSP), vol. 1. Toulouse, France, pp. 665–668.
Kinnunen, T., Lee, K., Li, H., 2008. Dimension reduction of the modulation spectrogram for speaker verification. In: Proc. ISCA Speaker and Language Recognition Workshop (ODYSSEY). Stellenbosch, South Africa.
Kowalski, 1996, Analysis of dynamic spectra in ferret primary auditory cortex: I. Characteristics of single unit responses to moving ripple spectra, J. Neurophysiol., 76, 3503, 10.1152/jn.1996.76.5.3503
Lim, 1979, Enhancement and bandwidth compression of noisy speech, Proc. IEEE, 67, 1586, 10.1109/PROC.1979.11540
Loizou, 2007
Lu, 2007, Reduction of musical residual noise for speech enhancement using masking properties and optimal smoothing, Pattern Recognition Lett., 28, 1300, 10.1016/j.patrec.2007.03.001
Lu, 2010, Temporal contrast normalization and edge-preserved smoothing of temporal modulation structures of speech for robust speech recognition, Speech Comm., 52, 1, 10.1016/j.specom.2009.08.006
Lyons, J., Paliwal, K., 2008. Effect of compressing the dynamic range of the power spectrum in modulation filtering based speech enhancement. In: Proc. ISCA Conf. Internat. Speech Commun. Assoc. (INTERSPEECH). Brisbane, Australia, pp. 387–390.
Malayath, 2000, Data-driven temporal filters and alternatives to GMM in speaker verification, Digital Signal Process., 10, 55, 10.1006/dspr.1999.0363
Mesgarani, N., Shamma, S., 2005. Speech enhancement based on filtering the spectrotemporal modulations. In: Proc. IEEE Internat. Conf. Acoustics, Speech, and Signal Process. (ICASSP), vol. 1. Philadelphia, PA, USA, pp. 1105–1108.
Nadeu, 1997, Filtering the time sequences of spectral parameters for speech recognition, Speech Commun., 22, 315, 10.1016/S0167-6393(97)00030-7
Paliwal, 2008, Effect of analysis window duration on speech intelligibility, IEEE Signal Process. Lett., 15, 785, 10.1109/LSP.2008.2005755
Payton, 1999, A method to determine the speech transmission index from speech waveforms, J. Acoust. Soc. Amer., 106, 3637, 10.1121/1.428216
Payton, 2002, Computing the STI using speech as a probe stimulus, 125
Portnoff, 1981, Short-time Fourier analysis of sampled speech, IEEE Trans. Acoust. Speech Signal Process., ASSP-29, 364, 10.1109/TASSP.1981.1163580
Quatieri, 2002
Rix, A., Beerends, J., Hollier, M., Hekstra, A., 2001. Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. ITU-T Recommendation, p. 862.
Schreiner, 1986, Representation of amplitude modulation in the auditory cortex of the cat. I. The anterior auditory field (AAF), Hearing Res., 21, 227, 10.1016/0378-5955(86)90221-2
Shamma, 1996, Auditory cortical representation of complex acoustic spectra as inferred from the ripple analysis method, Network: Comput. Neural Syst., 7, 439, 10.1088/0954-898X/7/3/001
Shannon, B., Paliwal, K., 2006. Role of phase estimation in speech enhancement. In: Proc. Internat. Conf. Spoken Language Process. (ICSLP). Pittsburgh, PA, USA, pp. 1423–1426.
Sheft, 1990, Temporal integration in amplitude modulation detection, J. Acoust. Soc. Amer., 88, 796, 10.1121/1.399729
Steeneken, 1980, A physical method for measuring speech-transmission quality, J. Acoust. Soc. Amer., 67, 318, 10.1121/1.384464
Thompson, J., Atlas, L., 2003. A non-uniform modulation transform for audio coding with increased time resolution. In: Proc. IEEE Internat. Conf. Acoustics, Speech, and Signal Process. (ICASSP), vol. 5. Hong Kong, pp. 397–400.
Tyagi, V., McCowan, I., Bourland, H., Misra, H., 2003. On factorizing spectral dynamics for robust speech recognition. In: Proc. ISCA European Conf. Speech Comm. and Technology (EUROSPEECH). Geneva, Switzerland, pp. 981–984.
Vaseghi, 1992, Restoration of old gramophone recordings, J. Audio Eng., 40, 791
Virag, 1999, Single channel speech enhancement based on masking properties of the human auditory system, IEEE Trans. Speech Audio Process., 7, 126, 10.1109/89.748118
Vuuren, S.V., Hermansky, H., 1998. On the importance of components of the modulation spectrum for speaker verification. In: Proc. Internat. Conf. Spoken Language Process. (ICSLP). Sydney, Australia, pp. 3205–3208.
Wang, 1982, The unimportance of phase in speech enhancement, IEEE Trans. Acoust. Speech Signal Process., 30, 679, 10.1109/TASSP.1982.1163920
Xiao, X., Chng, E., Li, H., 2007. Normalization of the speech modulation spectra for robust speech recognition. In: Proc. IEEE Internat. Conf. Acoustics, Speech, and Signal Process. (ICASSP), vol. 4. Monolulu, Hawaii, USA, pp. 1021–1024.