An overview of text-independent speaker recognition: From features to supervectors

Elsevier BV - Tập 52 Số 1 - Trang 12-40 - 2010
Tomi Tomi, Haizhou Haizhou

Tài liệu tham khảo

Adami, 2007, Modeling prosodic differences for speaker recognition, Speech Comm., 49, 277, 10.1016/j.specom.2007.02.005 Adami, A., Mihaescu, R., Reynolds, D., Godfrey, J., 2003. Modeling prosodic dynamics for speaker recognition. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2003), Hong Kong, China, April 2003, pp. 788–791. Alexander, A., Botti, F., Dessimoz, D., Drygajlo, A., 2004. The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications. Forensic Science International 146S, December 2004, pp. 95–99. Alku, 1999, A method for generating natural-sounding speech stimuli for cognitive brain research, Clin. Neurophysiol., 110, 1329, 10.1016/S1388-2457(99)00088-7 Altincay, 2003, Speaker identification by combining multiple classifiers using Dempster–Shafer theory of evidence, Speech Comm., 41, 531, 10.1016/S0167-6393(03)00032-3 Ambikairajah, E., 2007. Emerging features for speaker recognition. In: Proc. Sixth Internat. IEEE Conf. on Information, Communications & Signal Processing, Singapore, December 2007, pp. 1–7. Andrews, W., Kohler, M., Campbell, J., 2001. Phonetic speaker recognition. In: Proc. Seventh European Conf. on Speech Communication and Technology (Eurospeech 2001), Aalborg, Denmark, September 2001, pp. 2517–2520. Andrews, W., Kohler, M., Campbell, J., Godfrey, J., Hernandez-Cordero, J., 2002. Gender-dependent phonetic refraction for speaker recognition. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2002), Vol. 1, Orlando, Florida, USA, May 2002, pp. 149–152. Arcienega, M., Drygajlo, A., 2001. Pitch-dependent GMMs for text-independent speaker recognition systems. In: Proc. Seventh European Conf. on Speech Communication and Technology (Eurospeech 2001), Aalborg, Denmark, September 2001, pp. 2821–2824. Ashour, G., Gath, I., 1999. Characterization of speech during imitation. In: Proc. Sixth European Conf. on Speech Communication and Technology (Eurospeech 1999), Budapest, Hungary, September 1999, pp. 1187–1190. Atal, 1972, Automatic speaker recognition based on pitch contours, J. Acoust. Soc. Amer., 52, 1687, 10.1121/1.1913303 Atal, 1974, Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification, J. Acoust. Soc. Amer., 55, 1304, 10.1121/1.1914702 Atlas, 2003, Joint acoustic and modulation frequency, EURASIP J. Appl. Signal Process., 7, 668, 10.1155/S1110865703305013 Auckenthaler, R., Mason, J., 2001. Gaussian selection applied to text-independent speaker verification. In: Proc. Speaker Odyssey: the Speaker Recognition Workshop (Odyssey 2001), Crete, Greece, June 2001, pp. 83–88. Auckenthaler, 2000, Score normalization for text-independent speaker verification systems, Digital Signal Process., 10, 42, 10.1006/dspr.1999.0360 Bartkova, K., Gac, D.L., Charlet, D., Jouvet, D., 2002. Prosodic parameter for speaker identification. In: Proc. Internat. Conf. on Spoken Language Processing (ICSLP 2002), Denver, Colorado, USA, September 2002, pp. 1197–1200. Benyassine, 1997, ITU-T recommendation g729 annex b: a silence compression scheme for use with g729 optimized for v.70 digital simultaneous voice and data applications, IEEE Comm. Mag., 35, 64, 10.1109/35.620527 BenZeghiba, M., Bourland, H., 2003. On the combination of speech and speaker recognition. In: Proc. Eighth European Conf. on Speech Communication and Technology (Eurospeech 2003), Geneva, Switzerland, September 2003, pp. 1361–1364. BenZeghiba, 2006, User-customized password speaker verification using multiple reference and background models, Speech Comm., 48, 1200, 10.1016/j.specom.2005.08.008 Besacier, 2000, Subband architecture for automatic speaker recognition, Signal Process., 80, 1245, 10.1016/S0165-1684(00)00033-5 Besacier, 2000, Localization and selection of speaker-specific information with statistical modeling, Speech Comm., 31, 89, 10.1016/S0167-6393(99)00070-9 Bimbot, 1995, Second-order statistical measures for text-independent speaker identification, Speech Comm., 17, 177, 10.1016/0167-6393(95)00013-E Bimbot, 2004, A tutorial on text-independent speaker verification, EURASIP J. Appl. Signal Process., 4, 430, 10.1155/S1110865704310024 Bishop, 2006 Bocklet, T., Shriberg, E., 2009. Speaker recognition using syllable-based constraints for cepstral frame selection. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2009), Taipei, Taiwan, April 2009, pp. 4525–4528. Boersma, P., Weenink, D., 2009. Praat: doing phonetics by computer [computer program]. WWW page, June 2009, <http://www.praat.org/>. Bonastre, J.-F., Matrouf, D., Fredouille, C., 2007. Artificial impostor voice transformation effects on false acceptance rates. In: Proc. Interspeech 2007 (ICSLP), Antwerp, Belgium, August 2007, pp. 2053–2056. Brümmer, 2006, Application-independent evaluation of speaker detection, Comput. Speech Lang., 20, 230, 10.1016/j.csl.2005.08.001 Brümmer, 2007, Fusion of heterogeneous speaker recognition systems in the STBU submission for the NIST speaker recognition evaluation 2006, IEEE Trans. Audio, Speech Language Process., 15, 2072, 10.1109/TASL.2007.902870 Burget, 2007, Analysis of feature extraction and channel compensation in a GMM speaker recognition system, IEEE Trans. Audio, Speech Language Process., 15, 1979, 10.1109/TASL.2007.902499 Burget, L., Brümmer, N., Reynolds, D., Kenny, P., Pelecanos, J., Vogt, R., Castaldo, F., Dehak, N., Dehak, R., Glembek, O., Karam, Z., Noecker, J., Na, E., Costin, C., Hubeika, V., Kajarekar, S., Scheffer, N., and Černocký, J. 2009. Robust speaker recognition over varying channels – report from JHU workshop 2008. Technical report, March 2009, (URL valid June 2009). <http://www.clsp.jhu.edu/workshops/ws08/documents/jhu_report_main.pdf>. Burton, 1987, Text-dependent speaker verification using vector quantization source coding, IEEE Trans. Acoustics, Speech, Signal Process., 35, 133, 10.1109/TASSP.1987.1165110 Campbell, 1997, Speaker recognition: a tutorial, Proc. IEEE, 85, 1437, 10.1109/5.628714 Campbell, 2002, Speaker recognition with polynomial classifiers, IEEE Trans. Speech Audio Process., 10, 205, 10.1109/TSA.2002.1011533 Campbell, 2004, Phonetic speaker recognition with support vector machines, Vol. 16 Campbell, W., Sturim, D., Reynolds, D., 2005. SVM based speaker verification using a GMM supervector kernel and NAP variability compensation. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2005), Philadelphia, USA, March 2005, pp. 637–640. Campbell, 2006, Support vector machines for speaker and language recognition, Comput. Speech Lang., 20, 210, 10.1016/j.csl.2005.06.003 Campbell, 2006, Support vector machines using GMM supervectors for speaker verification, IEEE Signal Process. Lett., 13, 308, 10.1109/LSP.2006.870086 Carey, M., Parris, E., Lloyd-Thomas, H., Bennett, S., 1996. Robust prosodic features for speaker identification. In: Proc. Internat. Conf. on Spoken Language Processing (ICSLP 1996), Philadelphia, Pennsylvania, USA, 1996, pp. 1800–1803. Castaldo, 2007, Compensation of nuisance factors for speaker and language recognition, IEEE Trans. Audio, Speech Language Process., 15, 1969, 10.1109/TASL.2007.901823 Castaldo, 2007, Compensation of nuisance factors for speaker and language recognition, IEEE Trans. Audio, Speech Language Process., 15, 1969, 10.1109/TASL.2007.901823 Chan, 2007, Discrimination power of vocal source and vocal tract related features for speaker segmentation, IEEE Trans. Audio, Speech Language Process., 15, 1884, 10.1109/TASL.2007.900103 Charbuillet, C., Gas, B., Chetouani, M., Zarader, J., 2006. Filter bank design for speaker diarization based on genetic algorithms. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2006), Vol. 1, Toulouse, France, May 2006, pp. 673–676. Chaudhari, 2003, Multigrained modeling with pattern specific maximum likelihood transformations for text-independent speaker recognition, IEEE Trans. Speech Audio Process., 11, 61, 10.1109/TSA.2003.809121 Chen, 1997, Methods of combining multiple classifiers with different features and their applications to text-independent speaker recognition, Internat. J. Pattern Recognition Artif. Intell., 11, 417, 10.1142/S0218001497000196 Chen, Z.-H., Liao, Y.-F., Juang, Y.-T., 2004. Eigen-prosody analysis for robust speaker recognition under mismatch handset environment. In: Proc. Internat. Conf. on Spoken Language Processing (ICSLP 2004), Jeju, South Korea, October 2004, pp. 1421–1424. Chetouani, 2009, Investigation on LP-residual presentations for speaker identification, Pattern Recognition, 42, 487, 10.1016/j.patcog.2008.08.008 Cheveigné, A., Kawahara, H., 2001. Comparative evaluation of f0 estimation algorithms. In: Proc. Seventh European Conf. on Speech Communication and Technology (Eurospeech 2001), Aalborg, Denmark, September 2001, pp. 2451–2454. Damper, 2003, Improving speaker identification in noise by subband processing and decision fusion, Pattern Recognition Lett., 24, 2167, 10.1016/S0167-8655(03)00082-5 Davis, 1980, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Trans. Acoustics, Speech, Signal Process., 28, 357, 10.1109/TASSP.1980.1163420 DeCheveigne, 2002, YIN, a fundamental frequency estimator for speech and music, J. Acoust. Soc. Amer., 111, 1917, 10.1121/1.1458024 Dehak, N., Chollet, G., 2006. Support vector GMMs for speaker verification. In: Proc. IEEE Odyssey: the Speaker and Language Recognition Workshop (Odyssey 2006), San Juan, Puerto Rico, June 2006. Dehak, 2007, Modeling prosodic features with joint factor analysis for speaker verification, IEEE Trans. Audio, Speech Language Process., 15, 2095, 10.1109/TASL.2007.902758 Dehak, N., Dehak, R., Kenny, P., Dumouchel, P., 2008. Comparison between factor analysis and GMM support vector machines for speaker verification. In: The Speaker and Language Recognition Workshop (Odyssey 2008), Stellenbosch, South Africa, January 2008. Paper 009. Dehak, N., Kenny, P., Dehak, R., Glembek, O., Dumouchel, P., Burget, L., Hubeika, V., Castaldo, F., 2009. Support vector machines and joint factor analysis for speaker verification. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2009), Taipei, Taiwan, April 2009, pp. 4237–4240. Deller, 2000 Doddington, G., 2001. Speaker recognition based on idiolectal differences between speakers. In: Proc. Seventh European Conf. on Speech Communication and Technology (Eurospeech 2001), Aalborg, Denmark, September 2001, pp. 2521–2524. Dunn, R., Quatieri, T., Reynolds, D., Campbell, J. 2001. Speaker recognition from coded speech and the effects of score normalization. In: Proc. 35th Asilomar Conf. on Signals, Systems and Computers, Vol. 2, Pacific Grove, California, USA, November 2001, pp. 1562–1567. Espy-Wilson, C., Manocha, S., Vishnubhotla, S., 2006. A new set of features for text-independent speaker identification. In: Proc. Interspeech 2006 (ICSLP), Pittsburgh, Pennsylvania, USA, September 2006, pp. 1475–1478. Ezzaidi, H., Rouat, J., O’Shaughnessy, D., 2001. Towards combining pitch and MFCC for speaker identification systems. In: Proc. Seventh European Conf. on Speech Communication and Technology (Eurospeech 2001), Aalborg, Denmark, September 2001, pp. 2825–2828. Faltlhauser, R., Ruske, G., 2001. Improving speaker recognition performance using phonetically structured gaussian mixture models. In: Proc. Seventh European Conf. on Speech Communication and Technology (Eurospeech 2001), Aalborg, Denmark, September 2001, pp. 751–754. Farrell, 1994, Speaker recognition using neural networks and conventional classifiers, IEEE Trans. Speech Audio Process., 2, 194, 10.1109/89.260362 Farrell, K., Ramachandran, R., Mammone, R., 1998. An analysis of data fusion methods for speaker verification. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 1998), Vol. 2, Seattle, Washington, USA, pp. 1129–1132. Fauve, 2007, State-of-the-art performance in text-independent speaker verification through open-source software, IEEE Trans. Audio, Speech Language Process., 15, 1960, 10.1109/TASL.2007.902877 Fauve, B., Evans, N., Mason, J., 2008. Improving the performance of text-independent short duration SVM- and GMM-based speaker verification. In: The Speaker and Language Recognition Workshop (Odyssey 2008), Stellenbosch, South Africa, January 2008. Paper 018. Ferrer, L., Shriberg, E., Kajarekar, S., Sönmez, K., 2007. Parameterization of prosodic feature distributions for SVM modeling in speaker recognition. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2007), Vol. 4, Honolulu, Hawaii, USA, April 2007, pp. 233–236. Ferrer, L., Graciarena, M., Zymnis, A., Shriberg, E., 2008. System combination using auxiliary information for speaker verification. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2008), Las Vegas, Nevada, March–April 2008, pp. 4853–4856. Ferrer, L., Sönmez, K., Shriberg, E., 2008. An anticorrelation kernel for improved system combination in speaker verification. In: The Speaker and Language Recognition Workshop (Odyssey 2008), Stellenbosch, South Africa, January 2008. Paper 022. Fredouille, 2000, AMIRAL: a block-segmental multirecognizer architecture for automatic speaker recognition, Digital Signal Process., 10, 172, 10.1006/dspr.1999.0367 Furui, 1981, Cepstral analysis technique for automatic speaker verification, IEEE Trans. Acoustics, Speech Signal Process., 29, 254, 10.1109/TASSP.1981.1163530 Furui, 1997, Recent advances in speaker recognition, Pattern Recognition Lett., 18, 859, 10.1016/S0167-8655(97)00073-1 Garcia-Romero, D., Fierrez-Aguilar, J., Gonzalez-Rodriguez, J., Ortega-Garcia, J., 2004. On the use of quality measures for text-independent speaker recognition. In: Proc. Speaker Odyssey: the Speaker Recognition Workshop (Odyssey 2004), Vol. 4, Toledo, Spain, May 2004, pp. 105–110. Gersho, 1991 Glembek, O., Burget, L., Dehak, N., Br ummer, N., Kenny, P., 2009. Comparison of scoring methods used in speaker recognition with joint factor analysis. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2009), Taipei, Taiwan, April 2009, pp. 4057–4060. Gong, W.-G., Yang, L.-P., Chen, D., 2008. Pitch synchronous based feature extraction for noise-robust speaker verification. In: Proc. Image and Signal Processing (CISP 2008), Vol. 5, (May 2008), pp. 295–298. Gonzalez-Rodriguez, J., Garcia-Gomar, D. G.-R. M., Ramos-Castro, D., Ortega-Garcia, J., 2003. Robust likelihood ratio estimation in Bayesian forensic speaker recognition. In: Proc. 8th European Conf. on Speech Communication and Technology (Eurospeech 2003), Geneva, Switzerland, September 2003, pp. 693–696. Gopalan, 1999, A comparison of speaker identification results using features based on cepstrum and Fourier–Bessel expansion, IEEE Trans. Speech Audio Process., 7, 289, 10.1109/89.759036 Gudnason, J., Brookes, M., 2008. Voice source cepstrum coefficients for speaker identification. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2008), Las Vegas, Nevada, March–April 2008, pp. 4821–4824. Gupta, 1992, Text-independent speaker verification based on broad phonetic segmentation of speech, Digital Signal Process., 2, 69, 10.1016/1051-2004(92)90027-V Hannani, A., Petrovska-Delacrétaz, D., Chollet, G., 2004. Linear and non-linear fusion of ALISP-based and GMM systems for text-independent speaker verification. In Proc. Speaker Odyssey: the Speaker Recognition Workshop (Odyssey 2004), Toledo, Spain, May 2004, pp. 111–116. Hansen, E., Slyh, R., Anderson, T., 2004. Speaker recognition using phoneme-specific GMMs. In: Proc. Speaker Odyssey: the Speaker Recognition Workshop (Odyssey 2004), Toledo, Spain, May 2004, pp. 179–184. Harrington, 1999 Harris, 1978, On the use of windows for harmonic analysis with the discrete fourier transform, Proc. IEEE, 66, 51, 10.1109/PROC.1978.10837 Hatch, A., Stolcke, A., 2006. Generalized linear kernels for one-versus-all classification: application to speaker recognition. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2006), Toulouse, France, May 2006, pp. 585–588. Hatch, A., Stolcke, A., Peskin, B., 2005. Combining feature sets with support vector machines: application to speaker recognition. In: The 2005 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), November 2005, pp. 75–79. Hatch, A., Kajarekar, S., Stolcke, A., 2006. Within-class covariance normalization for SVM-based speaker recognition. In: Proc. Interspeech 2006 (ICSLP), Pittsburgh, Pennsylvania, USA, September 2006, pp. 1471–1474. Hautamäki, V., Tuononen, M., Niemi-Laitinen, T., Fränti, P., 2007. Improving speaker verification by periodicity based voice activity detection. In: Proc. 12th Internat. Conf. on Speech and Computer (SPECOM 2007), Moscow, Russia, October 2007, pp. 645–650. Hautamäki, 2008, Text-independent speaker recognition using graph matching, Pattern Recognition Lett., 29, 1427, 10.1016/j.patrec.2008.02.021 Hautamäki, 2008, Maximum a posteriori estimation of the centroid model for speaker verification, IEEE Signal Process. Lett., 15, 162, 10.1109/LSP.2007.914792 Hébert, 2008, Text-dependent speaker recognition, 743, 10.1007/978-3-540-49127-9_37 Hébert, M., Heck, L., 2003. Phonetic class-based speaker verification. In: Proc. Eighth European Conf. on Speech Communication and Technology (Eurospeech 2003), Geneva, Switzerland, September 2003, pp. 1665–1668. Heck, L., Genoud, D., 2002. Combining speaker and speech recognition systems. In: Proc. Internat. Conf. on Spoken Language Processing (ICSLP 2002), Denver, Colorado, USA, September 2002, pp. 1369–1372. Heck, L., and Weintraub, M. 1997. Handset-dependent background models for robust text-independent speaker recognition. In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 1997) (Munich, Germany, April 1997), pp. 1071–1074. Heck, 2000, Robustness to telephone handset distortion in speaker recognition by discriminative feature design, Speech Comm., 31, 181, 10.1016/S0167-6393(99)00077-1 Hedge, R., Murthy, H., Rao, G., 2004. Application of the modified group delay function to speaker identification and discrimination. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2004), Vol. 1, Montreal, Canada, May 2004, pp. 517–520. He, 1999, A discriminative training algorithm for VQ-based speaker identification, IEEE Trans. Speech Audio Process., 7, 353, 10.1109/89.759047 Hermansky, 1990, Perceptual linear prediction (PLP) analysis for speech, J. Acoust. Soc. Amer., 87, 1738, 10.1121/1.399423 Hermansky, 1998, Should recognizers have ears?, Speech Comm., 25, 3, 10.1016/S0167-6393(98)00027-2 Hermansky, 1994, RASTA processing of speech, IEEE Trans. Speech Audio Process., 2, 578, 10.1109/89.326616 Hess, 1983 Higgins, 1991, Speaker verification using randomized phrase prompting, Digital Signal Process., 1, 89, 10.1016/1051-2004(91)90098-6 Huang, 2001 Imperl, 1997, A study of harmonic features for the speaker recognition, Speech Comm., 22, 385, 10.1016/S0167-6393(97)00053-8 Jain, 2000, Statistical pattern recognition: a review, IEEE Trans. Pattern Anal. Machine Intell., 22, 4, 10.1109/34.824819 Jang, 2002, Learning statistically efficient features for speaker recognition, Neurocomputing, 49, 329, 10.1016/S0925-2312(02)00527-1 Jin, Q., Schultz, T., Waibel, A., 2002. Speaker identification using multilingual phone strings. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2002), Vol. 1, Orlando, Florida, USA, May 2002, pp. 145–148. Kajarekar, S., Hermansky, H., 2001. Speaker verification based on broad phonetic categories. In: Proc. Speaker Odyssey: the Speaker Recognition Workshop (Odyssey 2001), Crete, Greece, June 2001, pp. 201–206. Karam, Z., Campbell, W., 2007. A new kernel for SVM MLLR based speaker recognition. In: Proc. Interspeech 2007 (ICSLP), Antwerp, Belgium, August 2007, pp. 290–293. Karpov, E., Kinnunen, T., Fränti, P., 2004. Symmetric distortion measure for speaker recognition. In: Proc. Ninth Internat. Conf. on Speech and Computer (SPECOM 2004), St. Petersburg, Russia, September 2004, pp. 366–370. Kenny, P., 2006. Joint factor analysis of speaker and session variability: theory and algorithms. Technical Report CRIM-06/08-14. Kenny, 2007, Speaker and session variability in GMM-based speaker verification, IEEE Trans. Audio, Speech Language Process., 15, 1448, 10.1109/TASL.2007.894527 Kenny, 2008, A study of inter-speaker variability in speaker verification, IEEE Trans. Audio, Speech Language Process., 16, 980, 10.1109/TASL.2008.925147 Kinnunen, T., 2002. Designing a speaker-discriminative adaptive filter bank for speaker recognition. In: Proc. Internat. Conf. on Spoken Language Processing (ICSLP 2002), Denver, Colorado, USA, September 2002, pp. 2325–2328. Kinnunen, T., 2004. Spectral Features for Automatic Text-Independent Speaker Recognition. Licentiate’s Thesis, University of Joensuu, Department of Computer Science, Joensuu, Finland. Kinnunen, T., 2006. Joint acoustic-modulation frequency for speaker recognition. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2006), Vol. I, Toulouse, France, 2006, pp. 665–668. Kinnunen, T., Alku, P., 2009. On separating glottal source and vocal tract information in telephony speaker verification. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2009), Taipei, Taiwan, April 2009, pp. 4545–4548. Kinnunen, T., González-Hautamäki, R., 2005. Long-term f0 modeling for text-independent speaker recognition. In: Proc. 10th Internat. Conf. on Speech and Computer (SPECOM’2005), Patras, Greece, October 2005, pp. 567–570. Kinnunen, T., KilpelSinen, T., Fränti, P., 2000. Comparison of clustering algorithms in speaker identification. In: Proc. IASTED Internat. Conf. on Signal Processing and Communications (SPC 2000), Marbella, Spain, September 2000, pp. 222–227. Kinnunen, T., Hautamäki, V., Fränti, P., 2004. Fusion of spectral feature sets for accurate speaker identification. In: Proc. Ninth Internat. Conf. on Speech and Computer (SPECOM 2004), St. Petersburg, Russia, September 2004, pp. 361–365. Kinnunen, T., Hautamäki, V., Fränti, P., 2006. On the use of long-term average spectrum in automatic speaker recognition. In: 5th Internat. Symposium on Chinese Spoken Language Processing (ISCSLP’06), Singapore, December 2006, pp. 559–567. Kinnunen, 2006, Real-time speaker identification and verification, IEEE Trans. Audio, Speech Language Process., 14, 277, 10.1109/TSA.2005.853206 Kinnunen, T., Koh, C., Wang, L., Li, H., Chng, E., 2006. Temporal discrete cosine transform: Towards longer term temporal features for speaker verification. In: Proc. Fifth Internat. Symposium on Chinese Spoken Language Processing (ISCSLP 2006), Singapore, December 2006, pp. 547–558. Kinnunen, T., Zhang, B., Zhu, J., Wang, Y., 2007. Speaker verification with adaptive spectral subband centroids. In: Proc. Internat. Conf. on Biometrics (ICB 2007), Seoul, Korea, August 2007, pp. 58–66. Kinnunen, T., Lee, K.-A., Li, H. 2008. Dimension reduction of the modulation spectrogram for speaker verification. In: The Speaker and Language Recognition Workshop (Odyssey 2008), Stellenbosch, South Africa, January 2008. Kinnunen, 2009, Comparative evaluation of maximum a posteriori vector quantization and Gaussian mixture models in speaker verification, Pattern Recognition Lett., 30, 341, 10.1016/j.patrec.2008.11.007 Kitamura, T., 2008. Acoustic analysis of imitated voice produced by a professional impersonator. In: Proc. Interspeech 2008, September 2008, pp. 813–816. Kittler, 1998, On combining classifiers, IEEE Trans. Pattern Anal. Machine Intell., 20, 226, 10.1109/34.667881 Kolano, G., Regel-Brietzmann, P., 1999. Combination of vector quantization and Gaussian mixture models for speaker verification. In: Proc. Sixth European Conf. on Speech Communication and Technology (Eurospeech 1999), Budapest, Hungary, September 1999, pp. 1203–1206. Kryszczuk, 2007, Reliability-based decision fusion in multimodal biometric verification systems, EURASIP J. Adv. Signal Process., 1 Lapidot, 2002, Unsupervised speaker recognition based on competition between self-organizing maps, IEEE Trans. Neural Networks, 13, 877, 10.1109/TNN.2002.1021888 Laskowski, K., Jin, Q., 2009. Modeling instantaneous intonation for speaker identification using the fundamental frequency variation spectrum. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2009), Taipei, Taiwan, April 2009, pp. 4541–4544. Lee, K.-A., You, C., Li, H., Kinnunen, T., 2007. A GMM-based probabilistic sequence kernel for speaker verification. In: Proc. Interspeech 2007 (ICSLP), Antwerp, Belgium, August 2007, pp. 294–297. Lee, K., You, C., Li, H., Kinnunen, T., Zhu, D., 2008. Characterizing speech utterances for speaker verification with sequence kernel SVM. In: Proc. Ninth Interspeech (Interspeech 2008), Brisbane, Australia, September 2008, pp. 1397–1400. Leeuwen, 2006, NIST and NFI-TNO evaluations of automatic speaker recognition, Comput. Speech Lang., 20, 128, 10.1016/j.csl.2005.07.001 Leggetter, 1995, Maximum likelihood linear regression for speaker adaptation of continuous density HMMs, Comput. Speech Lang., 9, 171, 10.1006/csla.1995.0010 Lei, H., Mirghafori, N., 2007. Word-conditioned HMM supervectors for speaker recognition. In: Proc. Interspeech 2007 (ICSLP), Antwerp, Belgium, August 2007, pp. 746–749. Leung, 2006, Adaptive articulatory feature-based conditional pronunciation modeling for speaker verification, Speech Comm., 48, 71, 10.1016/j.specom.2005.05.013 Li, K.-P., Porter, J., 1988. Normalizations and selection of speech segments for speaker recognition scoring. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 1988), New York, USA, April 1988, pp. 595–598. Li, H., Ma, B., Lee, K.-A., Sun, H., Zhu, D., Sim, K., You, C., Tong, R., KärkkSinen, I., Huang, C.-L., Pervouchine, V., Guo, W., Li, Y., Dai, L., Nosratighods, M., Tharmarajah, T., Epps, J., Ambikairajah, E., Chng, E.-S., Schultz, T., Jin, Q., 2009. The I4U system in NIST 2008 speaker recognition evaluation. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2009), Taipei, Taiwan, April 2009, pp. 4201–4204. Linde, 1980, An algorithm for vector quantizer design, IEEE Trans. Comm., 28, 84, 10.1109/TCOM.1980.1094577 Longworth, 2007, Combining derivative and parametric kernels for speaker verification, IEEE Trans. Audio, Speech Language Process., 6, 1 Louradour, J., Daoudi, K., 2005. SVM speaker verification using a new sequence kernel. In: Proc. 13th European Conf. on Signal Processing (EUSIPCO 2005), Antalya, Turkey, September 2005. Louradour, J., Daoudi, K., André-Obrecht, R., 2005. Discriminative power of transient frames in speaker recognition. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2005), Vol. 1, Philadelphia, USA, 2005, pp. 613–616. Lu, 2007, An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification, Speech Comm., 50, 312, 10.1016/j.specom.2007.10.005 Ma, B., Zhu, D., Tong, R., 2006. Chinese dialect identification using tone features based on pitch flux. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2006), Vol. 1, Toulouse, France, May 2006, pp. 1029–1032. Ma, B., Zhu, D., Tong, R., Li, H., 2006. Speaker cluster based GMM tokenization for speaker recognition. In: Proc. Interspeech 2006 (ICSLP), Pittsburgh, Pennsylvania, USA, September 2006, pp. 505–508. Ma, 2007, Spoken language recognition with ensemble classifiers, IEEE Trans. Audio, Speech Language Process., 15, 2053, 10.1109/TASL.2007.902861 Magrin-Chagnolleau, 2002, Application of time–frequency principal component analysis to text-independent speaker identification, IEEE Trans. Speech Audio Process., 10, 371, 10.1109/TSA.2002.800557 Mak, 2004, Stochastic feature transformation with divergence-based out-of-handset rejection for robust speaker verification, EURASIP J. Appl. Signal Process., 4, 452, 10.1155/S1110865704308048 Mak, M.-W., Cheung, M., Kung, S., 2003. Robust speaker verification from GSM-transcoded speech based on decision fusion and feature transformation. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2003), Vol. 2, Hong Kong, China, April 2003, pp. 745–748. Mak, M.-W., Hsiao, R., Mak, B., 2006. A comparison of various adaptation methods for speaker verification with limited enrollment data. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2006), Vol. 1, Toulouse, France, May 2006, pp. 929–932. Makhoul, 1975, Linear prediction: a tutorial review, Proc. IEEE, 64, 561, 10.1109/PROC.1975.9792 Malayath, 2000, Data-driven temporal filters and alternatives to GMM in speaker verification, Digital Signal Process., 10, 55, 10.1006/dspr.1999.0363 Mami, 2006, Speaker recognition by location in the space of reference speakers, Speech Comm., 48, 127, 10.1016/j.specom.2005.06.014 Mammone, 1996, Robust speaker recognition: a feature based approach, IEEE Signal Process. Mag., 13, 58, 10.1109/79.536825 Mariéthoz, J., Bengio, S., 2002. A comparative study of adaptation methods for speaker verification. In: Proc. Internat. Conf. on Spoken Language Processing (ICSLP 2002), Denver, Colorado, USA, September 2002, pp. 581–584. Markel, 1977, Long-term feature averaging for speaker recognition, IEEE Trans. Acoustics, Speech, Signal Process., 25, 330, 10.1109/TASSP.1977.1162961 Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M., 1997. The DET curve in assessment of detection task performance. In: Proc. Fifth European Conf. on Speech Communication and Technology (Eurospeech 1997), Rhodos, Greece, September 1997, pp. 1895–1898. Mary, L., Yegnanarayana, B., 2006. Prosodic features for speaker verification. In: Proc. Interspeech 2006 (ICSLP), Pittsburgh, Pennsylvania, USA, September 2006, pp. 917–920. Mary, 2008, Extraction and representation of prosodic features for language and speaker recognition, Speech Comm., 50, 782, 10.1016/j.specom.2008.04.010 Mason, M., Vogt, R., Baker, B., Sridharan, S., 2005. Data-driven clustering for blind feature mapping in speaker verification. In: Proc. Interspeech 2005, Lisboa, Portugal, September 2005, pp. 3109–3112. McLaughlin, J., Reynolds, D., Gleason, T., 1999. A study of computation speed-ups of the GMM–UBM speaker recognition system. In: Proc. Sixth European Conf. on Speech Communication and Technology (Eurospeech 1999), Budapest, Hungary, September 1999, pp. 1215–1218. Misra, 2003, Speaker-specific mapping for text-independent speaker recognition, Speech Comm., 39, 301, 10.1016/S0167-6393(02)00046-8 Miyajima, 2001, A new approach to designing a feature extractor in speaker identification based on discriminative feature extraction, Speech Comm., 35, 203, 10.1016/S0167-6393(00)00079-0 Moonasar, V., Venayagamoorthy, G., 2001. A committee of neural networks for automatic speaker recognition (ASR) systems. In: Proc. Internat. Joint Conf. on Neural Networks (IJCNN 2001), Washington, DC, USA, July 2001, pp. 2936–2940. 2007, Vol. 4343 2007, Vol. 4441 Müller, 2001, An introduction to kernel-based learning algorithms, IEEE Trans. Neural Networks, 12, 181, 10.1109/72.914517 Murty, 2006, Combining evidence from residual phase and MFCC features for speaker recognition, IEEE Signal Process. Lett., 13, 52, 10.1109/LSP.2005.860538 Naik, J., Netsch, L., and Doddington, G. 1989. Speaker verification over long distance telephone lines. In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 1989) (Glasgow, May 1989), pp. 524–527. Nakasone, H., Mimikopoulos, M., Beck, S., Mathur, S., 2004. Pitch synchronized speech processing (PSSP) for speaker recognition. In: Proc. Speaker Odyssey: the Speaker Recognition Workshop (Odyssey 2004), Toledo, Spain, May 2004, pp. 251–256. Ney, 1997, Statistical language modeling using leaving-one-out, 174 Niemi-Laitinen, T., Saastamoinen, J., Kinnunen, T., Fränti, P., 2005. Applying MFCC-based automatic speaker recognition to GSM and forensic data. In: Proc. Second Baltic Conf. on Human Language Technologies (HLT’2005), Tallinn, Estonia, April 2005, pp. 317–322. NIST 2008 SRE results page, September 2008. <http://www.nist.gov/speech/tests/sre/2008/official_results/index.html>. Nolan, 1983 Oppenheim, 1999 Orman, D., Arslan, L., 2001. Frequency analysis of speaker identification. In: Proc. Speaker Odyssey: the Speaker Recognition Workshop (Odyssey 2001), Crete, Greece, June 2001, pp. 219–222. Paliwal, K., Alsteris, L., 2003. Usefulness of phase spectrum in human speech perception. In: Proc. Eighth European Conf. on Speech Communication and Technology (Eurospeech 2003), Geneva, Switzerland, September 2003, pp. 2117–2120. Park, A., Hazen, T., 2002. ASR dependent techniques for speaker identification. In: Proc. Internat. Conf. on Spoken Language Processing (ICSLP 2002), Denver, Colorado, USA, September 2002, pp. 1337–1340. Pelecanos, J., Sridharan, S., 2001. Feature warping for robust speaker verification. In: Proc. Speaker Odyssey: the Speaker Recognition Workshop (Odyssey 2001), Crete, Greece, June 2001, pp. 213–218. Pelecanos, J., Myers, S., Sridharan, S., Chandran, V., 2000. Vector quantization based Gaussian modeling for speaker verification. In: Proc. Internat. Conf. on Pattern Recognition (ICPR 2000), Barcelona, Spain, September 2000, pp. 3298–3301. Pellom, 1998, An efficient scoring algorithm for gaussian mixture model based speaker identification, IEEE Signal Process. Lett., 5, 281, 10.1109/97.728467 Pellom, B.L., Hansen, J.H.L., 1999. An experimental study of speaker verification sensitivity to computer voice-altered imposters. In: Proc. of the IEEE 1999 Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 1999), Vol. 2, Phoenix, AZ, USA, March 1999, pp. 837–840. Pfister, B., Beutler, R., 2003. Estimating the weight of evidence in forensic speaker verification. In: Proc. Eighth European Conf. on Speech Communication and Technology (Eurospeech 2003), Geneva, Switzerland, September 2003, pp. 701–704. Plumpe, 1999, Modeling of the glottal flow derivative waveform with application to speaker identification, IEEE Trans. Speech Audio Process., 7, 569, 10.1109/89.784109 Poh, N., Bengio, S., 2004. Why do multi-stream, multi-band and multi-modal approaches work on biometric user authentication tasks? In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2004), Vol. 5, Montreal, Canada, May 2004, pp. 893–896. Prasanna, 2006, Extraction of speaker-specific excitation information from linear prediction residual of speech, Speech Comm., 48, 1243, 10.1016/j.specom.2006.06.002 Przybocki, 2007, NIST speaker recognition evaluations utilizing the mixer corpora – 2004, 2005, 2006, IEEE Trans. Audio, Speech Language Process., 15, 1951, 10.1109/TASL.2007.902489 Rabiner, 1993 Ramachandran, 2002, Speaker recognition – general classifier approaches and data fusion methods, Pattern Recognition, 35, 2801, 10.1016/S0031-3203(01)00235-7 Ramirez, 2004, Efficient voice activity detection algorithms using long-term speech information, Speech Comm., 42, 271, 10.1016/j.specom.2003.10.002 Ramos-Castro, 2007, Speaker verification using speaker- and test-dependent fast score normalization, Pattern Recognition Lett., 28, 90, 10.1016/j.patrec.2006.06.008 Reynolds, 1995, Speaker identification and verification using Gaussian mixture speaker models, Speech Comm., 17, 91, 10.1016/0167-6393(95)00009-D Reynolds, D., 2003. Channel robust speaker verification via feature mapping. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2003), Vol. 2, Hong Kong, China, April 2003, pp. 53–56. Reynolds, 1995, Robust text-independent speaker identification using Gaussian mixture speaker models, IEEE Trans. Speech Audio Process., 3, 72, 10.1109/89.365379 Reynolds, 2000, Speaker verification using adapted gaussian mixture models, Digital Signal Process., 10, 19, 10.1006/dspr.1999.0361 Reynolds, D., Andrews, W., Campbell, J., Navratil, J., Peskin, B., Adami, A., Jin, Q., Klusacek, D., Abramson, J., Mihaescu, R., Godfrey, J., Jones, D., Xiang, B., 2003. The SuperSID project: exploiting high-level information for high-accuracy speaker recognition. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2003). Hong Kong, China, April 2003, pp. 784–787. Reynolds, D., Campbell, W., Gleason, T., Quillen, C., Sturim, D., Torres-Carrasquillo, P., Adami, A., 2005. The 2004 MIT Lincoln laboratory speaker recognition system. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2005), Vol. 1, Philadelphia, USA, 2005, pp. 177–180. Roch, 2006, Gaussian-selection-based non-optimal search for speaker identification, Speech Commu., 48, 85, 10.1016/j.specom.2005.06.003 Rodríguez-Liñares, 2003, On combining classifiers for speaker authentication, Pattern Recognition, 36, 347, 10.1016/S0031-3203(02)00035-3 Rose, 2002 Saastamoinen, 2005, Accuracy of MFCC based speaker recognition in series 60 device, EURASIP J. Appl. Signal Process., 17, 2816, 10.1155/ASP.2005.2816 Saeidi, 2009, Particle swarm optimization for sorted adapted gaussian mixture models, IEEE Trans. Audio, Speech Language Process., 17, 344, 10.1109/TASL.2008.2010278 Shriberg, 2005, Modeling prosodic feature sequences for speaker recognition, Speech Comm., 46, 455, 10.1016/j.specom.2005.02.018 Sivakumaran, 2003, Sub-band based text-dependent speaker verification, Speech Comm., 41, 485, 10.1016/S0167-6393(03)00017-7 Sivakumaran, P., Fortuna, J., Ariyaeeinia, A., 2003. Score normalization applied to open-set, text-independent speaker identification. In: Proc. Eighth European Conf. on Speech Communication and Technology (Eurospeech 2003), Geneva, Switzerland, September 2003, pp. 2669–2672. Slomka, S., Sridharan, S., Chandran, V., 1998. A comparison of fusion techniques in mel-cepstral based speaker identification. In: Proc. Internat. Conf. on Spoken Language Processing (ICSLP 1998), Sydney, Australia, November 1998, pp. 225–228. Slyh, R., Hansen, E., Anderson, T., 2004. Glottal modeling and closed-phase analysis for speaker recognition. In: Proc. Speaker Odyssey: the Speaker Recognition Workshop (Odyssey 2004), Toledo, Spain, May 2004, pp. 315–322. Solewicz, 2007, Using post-classifiers to enhance fusion of low- and high-level speaker recognition, IEEE Trans. Audio, Speech Language Process., 15, 2063, 10.1109/TASL.2007.903054 Solomonoff, A., Campbell, W., Boardman, I., 2005. Advances in channel compensation for SVM speaker recognition. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2005), Philadelphia, USA, March 2005, pp. 629–632. Sönmez, M., Heck, L., Weintraub, M., Shriberg, E., 1997. A lognormal tied mixture model of pitch for prosody-based speaker recognition. In: Proc. Fifth European Conf. on Speech Communication and Technology (Eurospeech 1997), Rhodos, Greece, September 1997, pp. 1391–1394. Sönmez, K., Shriberg, E., Heck, L., Weintraub, M., 1998. Modeling dynamic prosodic variation for speaker verification. In: Proc. Internat. Conf. on Spoken Language Processing (ICSLP 1998), Sydney, Australia, November 1998, pp. 3189–3192. Soong, 1988, On the use of instantaneous and transitional spectral information in speaker recognition, IEEE Trans. Acoustics, Speech Signal Process., 36, 871, 10.1109/29.1598 Soong, 1987, A vector quantization approach to speaker recognition, AT & T Technical J., 66, 14, 10.1002/j.1538-7305.1987.tb00198.x Stolcke, 2007, Speaker recognition with session variability normalization based on MLLR adaptation transforms, IEEE Trans. Audio, Speech Language Process., 15, 1987, 10.1109/TASL.2007.902859 Stolcke, A., Kajarekar, S., Ferrer, L., 2008. Nonparametric feature normalization for SVM-based speaker verification. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2008), Las Vegas, Nevada, April 2008, pp. 1577–1580. Sturim, D., Reynolds, D., 2005. Speaker adaptive cohort selection for Tnorm in text-independent speaker verification. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2005), Vol. 1, Philadelphia, USA, March 2005, pp. 741–744. Sturim, D., Reynolds, D., Singer, E., Campbell, J., 2001. Speaker indexing in large audio databases using anchor models. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2001), Vol. 1, Salt Lake City, Utah, USA, May 2001, pp. 429–432. Teunen, R., Shahshahani, B., Heck, L., 2000. A model-based transformational approach to robust speaker recognition. In: Proc. Internat. Conf. on Spoken Language Processing (ICSLP 2000), Vol. 2, Beijing, China, October 2000, pp. 495–498. Thévenaz, 1995, Usefulness of the LPC-residue in text-independent speaker verification, Speech Comm., 17, 145, 10.1016/0167-6393(95)00010-L Thian, N., Sanderson, C., Bengio, S., 2004. Spectral subband centroids as complementary features for speaker authentication. In: Proc. First Internat. Conf. on Biometric Authentication (ICBA 2004), Hong Kong, China, July 2004, pp. 631–639. Thiruvaran, 2008, Extraction of FM components from speech signals using all-pole model, Electronics Lett., 44, 10.1049/el:20080147 Thiruvaran, T., Ambikairajah, E., Epps, J., 2008. FM features for automatic forensic speaker recognition. In: Proc. Interspeech 2008, Brisbane, Australia, September 2008, pp. 1497–1500. Tong, R., Ma, B., Lee, K., You, C., Zhu, D., Kinnunen, T., Sun, H., Dong, M., Chng, E., Li, H., 2006. Fusion of acoustic and tokenization features for speaker recognition. In: Fifth Internat. Symposium on Chinese Spoken Language Processing (ISCSLP 2006), Singapore, December 2006, pp. 494–505. Torres-Carrasquillo, P., Reynolds, D., Deller Jr., J.D., 2002. Language identification using Gaussian mixture model tokenization. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2002), Vol. 1, Orlando, Florida, USA, May 2002, pp. 757–760. Tranter, 2006, An overview of automatic speaker diarization systems, IEEE Trans. Audio, Speech Language Process., 14, 1557, 10.1109/TASL.2006.878256 Tydlitat, B., Navratil, J., Pelecanos, J., Ramaswamy, G., 2007. Text-independent speaker verification in embedded environments. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2007), Vol. 4, Honolulu, Hawaii, April 2007, pp. 293–296. Viikki, 1998, Cepstral domain segmental feature vector normalization for noise robust speech recognition, Speech Comm., 25, 133, 10.1016/S0167-6393(98)00033-8 Vogt, 2008, Explicit modeling of session variability for speaker verification, Comput. Speech Lang., 22, 17, 10.1016/j.csl.2007.05.003 Vogt, R., Baker, B., Sridharan, S., 2005. Modelling session variability in text-independent speaker verification. In: Proc. Interspeech 2005, Lisboa, Portugal, September 2005, pp. 3117–3120. Vogt, R., Kajarekar, S., Sridharan, S., 2008. Discriminant NAP for SVM speaker recognition. In: The Speaker and Language Recognition Workshop (Odyssey 2008), Stellenbosch, South Africa, January 2008. Paper 010. Wan, 2005, Speaker verification using sequence discriminant support vector machines, IEEE Trans. Speech Audio Process., 13, 203, 10.1109/TSA.2004.841042 Wildermoth, B., and Paliwal, K. 2000. Use of voicing and pitch information for speaker recognition. In: Proc. Eighth Australian Internat. Conf. on Speech Science and Technology, Canberra, December 2000, pp. 324–328. Wolf, 1972, Efficient acoustic parameters for speaker recognition, J. Acoust. Soc. Amer., 51, 2044, 10.1121/1.1913065 Xiang, 2003, Text-independent speaker verification with dynamic trajectory model, IEEE Signal Process. Lett., 10, 141, 10.1109/LSP.2003.810913 Xiang, 2003, Efficient text-independent speaker verification with structural gaussian mixture models and neural network, IEEE Trans. Speech Audio Process., 11, 447, 10.1109/TSA.2003.815822 Xiang, B., Chaudhari, U., Navratil, J., Ramaswamy, G., Gopinath, R., 2002. Short-time Gaussianization for robust speaker verification. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2002), Vol. 1, Orlando, Florida, USA, May 2002, pp. 681–684. Xiong, 2006, A tree-based kernel selection approach to efficient Gaussian mixture model-universal background model based speaker identification, Speech Comm., 48, 1273, 10.1016/j.specom.2006.06.011 Yegnanarayana, 2002, AANN: an alternative to GMM for pattern recognition, Neural Networks, 15, 459, 10.1016/S0893-6080(02)00019-9 You, 2009, An SVM kernel with GMM-supervector based on the Bhattacharyya distance for speaker recognition, IEEE Signal Process. Lett., 16, 49, 10.1109/LSP.2008.2006711 Yuo, 1999, Joint estimation of feature transformation parameters and Gaussian mixture model for speaker identification, Speech Comm., 28, 227, 10.1016/S0167-6393(99)00017-5 Zheng, 2007, Integration of complementary acoustic features for speaker recognition, IEEE Signal Process. Lett., 14, 181, 10.1109/LSP.2006.884031 Zhu, D., Ma, B., Li, H., Huo, Q., 2007. A generalized feature transformation approach for channel robust speaker verification. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2007), Vol. 4, Honolulu, Hawaii, April 2007, pp. 61–64. Zhu, D., Ma, B., Li, H., 2008. Using MAP estimation of feature transformation for speaker recognition. In: Proc. Interspeech 2008, Brisbane, Australia, September 2008. Zhu, D., Ma, B., Li, H., 2009. Joint MAP adaptation of feature transformation and gaussian mixture model for speaker recognition. In: Proc. Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2009), Taipei, Taiwan, April 2009, pp. 4045–4048. Zilca, 2002, Text-independent speaker verification using utterance level scoring and covariance modeling, IEEE Trans. Speech Audio Process., 10, 363, 10.1109/TSA.2002.803419 Zilca, 2006, Pseudo pitch synchronous analysis of speech with applications to speaker recognition, IEEE Trans. Audio, Speech Language Process., 14, 467, 10.1109/TSA.2005.857809 Zissman, 1996, Comparison of four approaches to automatic language identification of telephone speech, IEEE Trans. Speech Audio Process., 4, 31, 10.1109/TSA.1996.481450