A heterogeneous speech feature vectors generation approach with hybrid hmm classifiers

International Journal of Speech Technology - Tập 20 - Trang 761-769 - 2017

Virender Kadyan¹, Archana Mantri², R. K. Aggarwal³

¹Department of Computer Science & Engineering, Chitkara University Institute of Engineering & Technology, Chitkara University, Punjab, India

²Department of Electronics & Communication Engineering, Chitkara University Institute of Engineering & Technology, Chitkara University, Punjab, India

³Department of Computer Engineering, N.I.T., Kurukshetra, India

Tóm tắt

Automatic speech recognition (ASR) system plays a vital role in the human–machine interaction. ASR system faces the challenge of performance degradation due to inconsistency between training and testing phases. This occurs due to extraction and representation of erroneous, redundant feature vectors. This paper proposes three different combinations at speech feature vector generation phase and two hybrid classifiers at modeling phase. In feature extraction phase MFCC, RASTA-PLP, and PLP are combined in different ways. In modeling phase, the mean and variance are calculated to generate the inter and intra class feature vectors. These feature vectors are further adopted by optimization algorithm to generate refined feature vectors with traditional statistical technique. This approach uses GA + HMM and DE + HMM techniques to produce refine model parameters. The experiments are conducted on datasets of large vocabulary isolated Punjabi lexicons. The simulation result shows the performance improvement using MFCC and DE + HMM technique when compared with RASTA-PLP, PLP using hybrid HMM classifiers.

Tài liệu tham khảo

Aggarwal, R. K., & Dave, M. (2013). Performance evaluation of sequentially combined heterogeneous feature streams for Hindi speech recognition system. Telecommunication Systems, 52(3), 1457–1466. Alam, M. J., Kenny, P., Dumouchel, P., & O’Shaughnessy, D. (2014). Robust feature extractors for continuous speech recognition. 22nd European Signal Processing Conference, (EUSIPCO), pp. 944–948. Alam, M. J., Kinnunen, T., Kenny, P., Ouellet, P., & O’Shaughnessy, D. (2013). Multitaper MFCC and PLP features for speaker verification using i-vectors. Speech Communication, 55(2), 237–251. Bengio, Y., & Grandvalet, Y. (2004). No unbiased estimator of the variance of k-fold cross-validation. Journal of Machine Learning Research, 5, 1089–1105. Chang, E. I., Lippmann, R., & Tong, D. W. (1990). Using genetic algorithms to improve pattern classification performance. In NIPS, pp. 797–803. Clemente, I. A., Heckmann, M., & Wrede, B. (2012). Incremental word learning: Efficient hmm initialization and large margin discriminative adaptation. Speech Communication, 54(9), 1029–1048. Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28, 357–366. Dua, M., Aggarwal, R. K., Kadyan, V., & Dua, S. (2012). Punjabi speech to text system for connected words. Fourth International Conference on Advances in Recent Technologies in Communication and Computing, pp. 206–209. Dua, M., Aggarwal, R. K., Kadyan, V., & Dua, S. (2012). Punjabi automatic speech recognition using HTK. International Journal of Computer Science Issues, 9(4), 359–364. Ganapathiraju, A. (2002). Support vector machines for speech recognition. Doctoral dissertation, Mississippi State University. Ghai, W., & Singh, N. (2013). Continuous speech recognition for Punjabi language. International Journal of Computer Applications, 72(14), 23–28. Grierson, G. A. (1968). Linguistic survey of India. 5: Indo-aryan family, Eastern group; 2. New Delhi: Motilal Banarsidass. Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 87(4), 1738–1752. Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97. Holand, J. H. (1975). Adaptation in natural and artificial systems’. Ann Arbor: University of Michigan. Juang, B. H., & Rabiner, L. R. (1991). Hidden Markov models for speech recognition. Technometrics, 33(3), 251–272. Koehler, J., Morgan, N., Hermansky, H., Hirsch, H. G., & Tong, G. (1994). Integrating RASTA-PLP into speech recognition. IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 1, pp. 421–424. Kumar, R., & Singh, M. (2011). Spoken isolated word recognition of Punjabi language using dynamic time warp technique. In C. Singh, G. Singh Lehal, J. Sengupta, D. V. Sharma, V. Goyal (Eds.), Information systems for Indian languages (pp. 301–301). Heidelberg: Springer. Lata, S., & Arora, S. (2012). Exploratory analysis of Punjabi tones in relation to orthographic characters: A case study. Workshop on Indian Language and Data: Resources and Evaluation Workshop Programme, pp. 76. Lippmann, R. P. (1989). Review of neural networks for speech recognition. Neural Computation, 1(1), 1–38. Mittal, S. (2014). Development of phonetic engine for Punjabi language. Masters dissertation, Thapar University Patiala. Mittal, T., & Sharma, R. K. (2016). Speech recognition using ANN and predator-influenced civilized swarm optimization algorithm. Turkish Journal of Electrical Engineering & Computer Sciences, 24(6), 4790–4803. Povey, D., Burget, L., Agarwal, M., Akyazi, P., Feng, K., Ghoshal, A., & Schwarz, P. (2010). Subspace gaussian mixture models for speech recognition. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4330–4333. Psutka, J., Müller, L., & Psutka, J. V. (2001). Comparison of MFCC and PLP parameterizations in the speaker independent continuous speech recognition task. 7th European Conference on Speech Communication and Technology (EUROSPEECH 2001), Aalborg, Denmark, pp. 1813–1816. Punjabi Speech Corpus. Retrieved at 10:30, August 20, 2015, from http://cdac.in/index.aspx?id=mc_ilf_Speech_Corpora. Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA