An improved i-vector extraction algorithm for speaker verification

EURASIP Journal on Audio, Speech, and Music Processing - Tập 2015 - Trang 1-9 - 2015

Wei Li¹, Tianfan Fu², Jie Zhu¹

¹Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, China

²Department of Computer Science and Engineering (CSE), Shanghai Jiao Tong University, Shanghai, China

Tóm tắt

Over recent years, i-vector-based framework has been proven to provide state-of-the-art performance in speaker verification. Each utterance is projected onto a total factor space and is represented by a low-dimensional feature vector. Channel compensation techniques are carried out in this low-dimensional feature space. Most of the compensation techniques take the sets of extracted i-vectors as input. By constructing between-class covariance and within-class covariance, we attempt to minimize the between-class variance mainly caused by channel effect and to maximize the variance between speakers. In the real-world application, enrollment and test data from each user (or speaker) are always scarce. Although it is widely thought that session variability is mostly caused by channel effects, phonetic variability, as a factor that causes session variability, is still a matter to be considered. We propose in this paper a new i-vector extraction algorithm from the total factor matrix which we term component reduction analysis (CRA). This new algorithm contributes to better modelling of session variability in the total factor space. We reported results on the male English trials of the core condition of the NIST 2008 Speaker Recognition Evaluation (SREs) dataset. As measured both by equal error rate and the minimum values of the NIST detection cost function, 10–15 % relative improvement is achieved compared to the baseline of traditional i-vector-based system.

Tài liệu tham khảo

DA Reynolds, TF Quatieri, RB Dunn, Speaker verification using adapted gaussian mixture models. Digit. Signal Process. 10(1), 19–41 (2000). P Kenny, G Boulianne, P Dumouchel, Eigenvoice modeling with sparse training data. IEEE Trans. Speech and Audio Process. 13(3), 345–354 (2005). P Kenny, P Ouellet, N Dehak, V Gupta, P Dumouchel, A study of interspeaker variability in speaker verification. IEEE Trans. Audio, Speech, and Lang. Process. 16(5), 980–988 (2008). P Kenny, Joint factor analysis of speaker and session variability: Theory and algorithms. CRIM, Montreal, (Report) CRIM-06/08-13 (2005). P Kenny, G Boulianne, P Ouellet, P Dumouchel, Joint factor analysis versus eigenchannels in speaker recognition. IEEE Trans. Audio, Speech, and Lang. Process. 15(4), 1435–1447 (2007). P Kenny, in Odyssey. Bayesian speaker verification with heavy-tailed priors, (2010), p. 14. N Dehak, P Kenny, R Dehak, P Dumouchel, P Ouellet, Front-end factor analysis for speaker verification. IEEE Trans. Audio, Speech, and Lang. Process. 19(4), 788–798 (2011). P-M Bousquet, D Matrouf, J-F Bonastre, in INTERSPEECH. Intersession compensation and scoring methods in the i-vectors space for speaker recognition, (2011), pp. 485–488. P-M Bousquet, A Larcher, D Matrouf, J-F Bonastre, O Plchot, in Odyssey: The Speaker and Language Recognition Workshop. Variance-spectra based normalization for i-vector standard and probabilistic linear discriminant analysis (Singapore, Singapore, 2012), pp. 157–164. H Aronowitz, in Odyssey 2012-The Speaker and Language Recognition Workshop. Text dependent speaker verification using a small development set, (2012). A Larcher, P Bousquet, KA Lee, D Matrouf, H Li, J-F Bonastre, in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference On. I-vectors in the context of phonetically-constrained short utterances for speaker verification (IEEE, 2012), pp. 4773–4776. T Stafylakis, P Kenny, P Ouellet, J Perez, M Kockmann, P Dumouchel, Text-dependent speaker recognition using plda with uncertainty propagation. Matrix. 500, 1 (2013). J-F Bonastre, N Scheffer, D Matrouf, C Fredouille, A Larcher, A Preti, G Pouchoulin, NW Evans, BG Fauve, JS Mason, in Odyssey. Alize/spkdet: a state-of-the-art open source software for speaker recognition, (2008), p. 20.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA