GMM based language identification system using robust features

International Journal of Speech Technology - Tập 17 - Trang 99-105 - 2013
Sadanandam Manchala1, V. Kamakshi Prasad2, V. Janaki3
1Kakatiya University, Warangal, India
2JawaharLal Nehru Technological University, Hyderabad, India
3Vagdevi Engineering College, Warangal, India

Tóm tắt

In this work, we have proposed new feature vectors for spoken language identification (LID) system. The Mel frequency cepstral coefficients (MFCC) and formant frequencies derived using short-time window speech signal. Formant frequencies are extracted from linear prediction (LP) analysis of speech signal. Using these two kind of features of speech signal, new feature vectors are derived using cluster based computation. A GMM based classifier has been designed using these new feature vectors. The language specific apriori knowledge is applied on the recognition output. The experiments are carried out on OGI database and LID recognition performance is improved.

Tài liệu tham khảo

Bruce, I. C., & Mustafa, K. (2006). Robust formant tracking for continuous speech with speaker variability. IEEE Transactions on Acoustics, Speech, and Signal Processing, 14(2), 435–444. Bruce, I. C., Karkhanis, N. V., Young, E. D., & Sachs, M. B. (2002). Robust formant tracking in noise. In ICASSP. Chelba, C., Hazen, T., & Saraclar, M. (2008). Retrieval and browsing of spoken content. IEEE Signal Processing Magazine, 25(3), 39–49. Cimarusti, D., & Eves, R. B. (1982). Development of an automatic identification system for spoken languages, phase I. In Proc. IEEE int. conf. acoust., speech, and signal processing (pp. 1661–1663). Kamakshi Prasad, V., Nagarajan, T., & Murthy, H. A. (2004). Automatic segmentation of continuous speech using minimum phase group delay functions. Speech Communication, 42, 429–446. Kirchhoff, K. (2006). Language characteristics. In T. Schultz & K. Kirchhoff (Eds.), Multilingual speechprocessing. Amsterdam: Elsevier. Martin, A. F., & Garofolo, J. S. (2007). NIST speech processing evaluations: LVCSR, speaker recognition, language recognition. In Proc. IEEE workshop on signal processing applications for public security and forensics (pp. 1–7). Muthusamy, Y. K., Barnard, E., & Cole, R. A. (1994). Automatic language identification: a review/tutorial. IEEE Signal Processing Magazine, Oct. 1994 Nagarajan, T., & Murthy, H. A. (2002). Language identification using spectral vector distribution across the languages. In Proceedings of int. conf. natural language processing. Nagarajan, T., & Murthy, H. A. (2004). Language identification using parallel syllable-like unit recognition. In Proc. IEEE int. conf. acoust. speech, and signal processing. Nakagawa, S., & Suzuki, H. (1993). A new speech recognition method based on VQ-distortion measure and HMM. In Proc. int. conf. ASSP (pp. 673–679). NIST language recognition evaluations (2007). http://nist.gov/itl/iad/mig/lre.cfm. OGI multi language telephone speech. www.cslu.ogi.edu/corpora/mlts/, January 2004. Schultz, T., & Waibel, A. (2001). Language independent and language adaptive. Speech Communication, 35(1–2), 31–51. Torres Carrasquillo, P. A., Reynolds, D. A., & Deller, J. R. (2002). Language identification using Gaussian mixture model tokenization. In Proc. IEEE int. conf. acoust., speech, and signal processing (Vol. 1, pp. 757–760). Torres-Carrasquillo, P., Singer, E., Kohler, M., Greene, R., Reynolds, D., & Deller, J. Jr. (2002). Approaches to language identification using Gaussian mixture models and shifted delta cepstral features. In Proc. ICSLP (pp. 89–92). Waibel, A., Geutner, P., Tomokiyo, L. M., Schultz, T., & Woszczyna, M. (2000). Multilinguality in speech and spoken language systems. Proceedings of the IEEE, 88(8), 1181–1190. Yegnanarayana, B. (1978). Formant extraction from linear prediction phase spectrum. The Journal of the Acoustical Society of America, 63, 1638–1640. Zhao, J., Shu, H., Zhang, L., Wang, X., Gong, Q., & Li, P. (2008). Cortical competition during language discrimination. NeuroImage, 43, 624–633. Zissman, M. A. (1995). Overview of current techniques for automatic language identification of speech. In Proceedings of the IEEE automatic speech recognition workshop (pp. 60–62). Zissman, M. A. (1995). Automatic language identification of telephone speech. The Lincoln Laboratory Journal, 8(2), 115–144. Zissman, M. A. (1996). Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Speech and Audio Processing, SAP-4(1), 31–44.