Robust several-speaker speech recognition with highly dependable online speaker adaptation and identification

Journal of Network and Computer Applications - Tập 34 - Trang 1459-1467 - 2011
Po-Yi Shih1, Po-Chuan Lin2, Jhing-Fa Wang1, Yuan-Ning Lin1
1Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan
2Department of Electronics Engineering and Computer Science, Tung Fang Institute of Technology, Kaohsiung, Taiwan

Tài liệu tham khảo

Campbell, 2006, Support vector machines using GMM supervectors for speaker verification, IEEE Signal Processing Letters, 13, 308, 10.1109/LSP.2006.870086 Cerva, 2005, Supervised and Unsupervised Speaker Adaptation in Large Vocabulary Continuous Speech Recognition of Czech, 3658, 203 Chen K-T, Liau W-W, Wang H-M, Lee L-S. Fast speaker adaptation using eigenspace-based maximum-likelihood linear regression. In: Proceedings of the ICSLP, vol. 3; 2000. pp. 742–45. Clarkson, 2001, Speaker identification for security systems using reinforcement-trained pRAM neural network architectures, IEEE Transactions on Systems, Man and Cybernetics, Part C, 31, 65, 10.1109/5326.923269 Doddington, 2000, The NIST speaker recognition evaluation—overview, methodology, systems, results, perspective, Speech Communication, 31, 225, 10.1016/S0167-6393(99)00080-1 Ganapathiraju, 2004, Applications of support vector machines to speech recognition, IEEE Transactions on Signal Processing, 52, 2348, 10.1109/TSP.2004.831018 Ganapathiraju, 2004, Applications of support vector machines to speech recognition, IEEE Transactions on Signal Processing, 52, 2348, 10.1109/TSP.2004.831018 Gauvain, 1994, Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains, IEEE Transaction on Speech and Audio Processing, 2, 291, 10.1109/89.279278 Huanga, 2007, ESVM: evolutionary support vector machine for automatic feature selection and classification of micro array data, Bio-Ssytems, 90, 516, 10.1016/j.biosystems.2006.12.003 IBM Via Voice V10. 〈http://www.nuance.com/viavoice/〉. Jiang, 2001, A Bayesian approach to the verification problem: applications to speaker verification, IEEE Transactions on Speech and Audio Processing, 9, 874, 10.1109/89.966090 Jolloffe, 1986 Koo, 2001, Speech recognition and utterance verification based on a generalized confidence score, IEEE Transactions on Speech and Audio Processing, 9 Kuhn R, Nguyen P, Goldwasser J-C, Niedzielski L, Junqua N, Fincke S, et al. Eigenvoices for speaker adaptation. In: Proceedings of the ICSLP’98; 1998. pp. 1771–74 Leggetter, 1995, Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models, Journal of Computer Speech and Language, 9, 171, 10.1006/csla.1995.0010 Lee C-H. A tutorial on speaker and speech verification. In: Proceedings of the NORSIG-98, Vigso, Denmark; 1998. pp. 9–16 Li, 2001, A detection approach to search-space reduction for HMM state alignment in speaker verification, IEEE Transactions on Speech and Audio Processing, 9, 569, 10.1109/89.928921 Mak Brian K-W, Kwok James T, Ho S. Using Kernel PCA to improve eigenvoice speaker adaptation. In: Proceedings of the third international conference on machine learning and cybemetics, Shanghai, 26–29 August 2004. Mak, 2007, Kernel eigenspace-based MLLR adaptation, IEEE Transactions on Audio, Speech, and Language Processing, 15, 10.1109/TASL.2006.885941 Microsoft Speech Recognition Engine. SAPI(5.3). 〈http://msdn.microsoft.com/〉. Modi P, Rahim M. Discriminative utterance verification using multiple confidence measures. In: Proceedings of the EUROSPEECH’97; 1997. pp. 103–06. Rahim, 1997, Discriminative utterance verification for connected digits recognition, IEEE Transactions on Speech, Audio and Processing, 5, 266, 10.1109/89.568733 Rahim M, Lee C-H, Juang B-H, Chou W. Discriminative utterance verification using minimum string verification error (MSVE) training. In: Proceedings of the IEEE ICASSP’96; 1996. pp. 3585–88. Sanchis A, Juan A, Vidal E. Improving utterance verification using a smoothed naive Bayes model. In: Proceedings of the ICASSP’2003, vol. 1; 2003. pp. 592–95. Sanchis A, Juan A, Vidal E. Estimating confidence measures for speech recognition verification using a smoothed naïve Bayesmodel. In: IbPRIA 2003 Proceedings. Lecture Notes in Computer Science LNCS, vol. 2652; 2003. pp. 910–18. Sanchis A, Juan A, Vidal E. New features based on multiple word graphs for utterance verification. In: Proccedings of the 8th international conference on spoken language processing, 2004. pp. 2545–48. Tseng CY. A phonetically oriented speech database for Mandarin Chinese. In: Proceedings of the ICPhS95, Stockholm; 1995, pp. 326–29. Vapnik, 1998 Wang J-C, Wang J-F, Lin C-B, Jian K-T, Kuok W-H. Content-based audio classification using support vector machines and independent component analysis. In: Proceedings of the ICPR06; 2006 (I: pp. 1204–07). Wang, 2007, Robust speaker identification and verification, IEEE Computational Intelligence Magazine, 52, 10.1109/MCI.2007.353420 Wang, 2007, Critical band subspace-based speech enhancement using SNR and auditory masking aware technique, IEICE Transactions on Information and Systems, 90, 1055, 10.1093/ietisy/e90-d.7.1055 Woodland PC. Speaker adaptation: techniques and challenges. In: Proceedings of the IEEE workshop on automatic speech recognition and understanding; 2000. pp. 85–90. Xiang, 2003, Efficient text-independent speaker verification with structural Gaussian mixture models and neural network, IEEE Transactions on Speech and Audio Processing, 11, 447, 10.1109/TSA.2003.815822 Young, 2002