Blind Stochastic Feature Transformation for Channel Robust Speaker Verification

Journal of VLSI signal processing systems for signal, image and video technology - Tập 42 - Trang 117-126 - 2006

K.K. Yiu¹, M. W. Mak¹, M. C. Cheung¹, S. Y. Kung²

¹Center for Multimedia Signal Processing, Department of Electronic & Information Engineering, The Hong Kong Polytechnic University, Hung Hom, Hong Kong

²Department of Electrical Engineering, Princeton University, USA

Tóm tắt

To improve the reliability of telephone-based speaker verification systems, channel compensation is indispensable. However, it is also important to ensure that the channel compensation algorithms in these systems surpress channel variations and enhance interspeaker distinction. This paper addresses this problem by a blind feature-based transformation approach in which the transformation parameters are determined online without any a priori knowledge of channel characteristics. Specifically, a composite statistical model formed by the fusion of a speaker model and a background model is used to represent the characteristics of enrollment speech. Based on the difference between the claimant's speech and the composite model, a stochastic matching type of approach is proposed to transform the claimant's speech to a region close to the enrollment speech. Therefore, the algorithm can estimate the transformation online without the necessity of detecting the handset types. Experimental results based on the 2001 NIST evaluation set show that the proposed transformation approach achieves significant improvement in both equal error rate and minimum detection cost as compared to cepstral mean subtraction and Znorm.

Tài liệu tham khảo

A.C. Surendran, C.H. Lee, and M. Rahim, “Nonlinear Compensation for Stochastic Matching,” IEEE Trans. on Speech and Audio Processing, vol. 7, no. 6, 1999, pp. 643–655. M.W. Mak and S.Y. Kung, “Combining Stochastic Feautre Transformation and Handset Identification for Telephone-based Speaker Verification,” in Proc. ICASSP′02, 2002, pp. I701–I704. F. Beaufays and M. Weintraub, “Model Transformation for Robust Speaker Recognition from Telephone Data,” in ICASSP-97, 1997, vol. 2, pp. 1063–1066. K.K. Yiu, M.W. Mak, and S.Y. Kung, “Environment Adaptation for Robust Speaker Verification,” in Eurospeech′03, 2003, pp. 2973–2976. D. A. Reynolds, “Comparison of Background Normalization Methods for Text Independent Speaker Verification,” in Eurospeech′97, 1997, pp. 963–966. R. Auckenthaler, M. Carey, and H. Lloyd-Thomas, “Score Normalization for Text-independent Speaker Verification Systems,” Digital Signal Processing, vol. 10, pp. 42–54, 2000. K.P. Li and J.E. Porter, “Normalizations and Selection of Speech Segments for Speaker Recognition Scoring,” in ICASSP-88, 1988, vol. 1, pp. 595–598. C.L. Tsang, M.W. Mak, and S.Y. Kung, “Divergence-based Out-of-Class Rejection for Telephone Handset Identification,” in Proc. Int. Conf. on Spoken Language Processing, 2002, pp. 2329–2332. M.W. Mak, C.L. Tsang, and S.Y. Kung, “Stochastic Feature Transformation with Divergence-based Out-of-Handset Rejection for Robust Speaker Verification,” EURASIP J. on Applied Signal Processing, vol. 4, 2004, pp. 452–465. D.A. Reynolds, T.F. Quatieri, and R.B. Dunn, “Speaker Verification Using Adapted Gaussian Mixture Models,” Digital Signal Processing, vol. 10, 2000, pp. 19–41. “The NIST Year 2001 Speaker Recognition Evaluation Plan,” in http://www.nist.gov/speech/tests/spk/2001/doc. S.B. Davis and P. Mermelstein, “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences,” IEEE Trans. on ASSP, vol. 28, no. 4, 1980, pp. 357–366. S. Furui, “Cepstral Analysis Technique for Automatic Speaker Verification,” IEEE Trans. on Acoustic Speech and Signal Processing, vol. 29, no. 2, 1981, pp. 254–272. M. Przybocki and A. Martin, “NIST's Assessment of Text Independent Speaker Recognition Performance 2002,” in The Advent of Biometircs on the Internet, A COST 275 Workshop, Rome, Italy, Nov. 2002. B. Xiang, U. Chaudhari, J. Navratil, G. Ramaswamy, and R. Gopinath, “Short-time Gaussianization for Robust Speaker Verification,” in Proc. IEEE ICASSP02, 2002, vol. 1, pp. 681–684. A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. of Royal Statistical Soc., Ser. B., vol. 39, no. 1, 1977, pp. 1–38.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA