A heterogeneous speech feature vectors generation approach with hybrid hmm classifiers
Tóm tắt
Automatic speech recognition (ASR) system plays a vital role in the human–machine interaction. ASR system faces the challenge of performance degradation due to inconsistency between training and testing phases. This occurs due to extraction and representation of erroneous, redundant feature vectors. This paper proposes three different combinations at speech feature vector generation phase and two hybrid classifiers at modeling phase. In feature extraction phase MFCC, RASTA-PLP, and PLP are combined in different ways. In modeling phase, the mean and variance are calculated to generate the inter and intra class feature vectors. These feature vectors are further adopted by optimization algorithm to generate refined feature vectors with traditional statistical technique. This approach uses GA + HMM and DE + HMM techniques to produce refine model parameters. The experiments are conducted on datasets of large vocabulary isolated Punjabi lexicons. The simulation result shows the performance improvement using MFCC and DE + HMM technique when compared with RASTA-PLP, PLP using hybrid HMM classifiers.
Tài liệu tham khảo
Aggarwal, R. K., & Dave, M. (2013). Performance evaluation of sequentially combined heterogeneous feature streams for Hindi speech recognition system. Telecommunication Systems, 52(3), 1457–1466.
Alam, M. J., Kenny, P., Dumouchel, P., & O’Shaughnessy, D. (2014). Robust feature extractors for continuous speech recognition. 22nd European Signal Processing Conference, (EUSIPCO), pp. 944–948.
Alam, M. J., Kinnunen, T., Kenny, P., Ouellet, P., & O’Shaughnessy, D. (2013). Multitaper MFCC and PLP features for speaker verification using i-vectors. Speech Communication, 55(2), 237–251.
Bengio, Y., & Grandvalet, Y. (2004). No unbiased estimator of the variance of k-fold cross-validation. Journal of Machine Learning Research, 5, 1089–1105.
Chang, E. I., Lippmann, R., & Tong, D. W. (1990). Using genetic algorithms to improve pattern classification performance. In NIPS, pp. 797–803.
Clemente, I. A., Heckmann, M., & Wrede, B. (2012). Incremental word learning: Efficient hmm initialization and large margin discriminative adaptation. Speech Communication, 54(9), 1029–1048.
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28, 357–366.
Dua, M., Aggarwal, R. K., Kadyan, V., & Dua, S. (2012). Punjabi speech to text system for connected words. Fourth International Conference on Advances in Recent Technologies in Communication and Computing, pp. 206–209.
Dua, M., Aggarwal, R. K., Kadyan, V., & Dua, S. (2012). Punjabi automatic speech recognition using HTK. International Journal of Computer Science Issues, 9(4), 359–364.
Ganapathiraju, A. (2002). Support vector machines for speech recognition. Doctoral dissertation, Mississippi State University.
Ghai, W., & Singh, N. (2013). Continuous speech recognition for Punjabi language. International Journal of Computer Applications, 72(14), 23–28.
Grierson, G. A. (1968). Linguistic survey of India. 5: Indo-aryan family, Eastern group; 2. New Delhi: Motilal Banarsidass.
Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 87(4), 1738–1752.
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.
Holand, J. H. (1975). Adaptation in natural and artificial systems’. Ann Arbor: University of Michigan.
Juang, B. H., & Rabiner, L. R. (1991). Hidden Markov models for speech recognition. Technometrics, 33(3), 251–272.
Koehler, J., Morgan, N., Hermansky, H., Hirsch, H. G., & Tong, G. (1994). Integrating RASTA-PLP into speech recognition. IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 1, pp. 421–424.
Kumar, R., & Singh, M. (2011). Spoken isolated word recognition of Punjabi language using dynamic time warp technique. In C. Singh, G. Singh Lehal, J. Sengupta, D. V. Sharma, V. Goyal (Eds.), Information systems for Indian languages (pp. 301–301). Heidelberg: Springer.
Lata, S., & Arora, S. (2012). Exploratory analysis of Punjabi tones in relation to orthographic characters: A case study. Workshop on Indian Language and Data: Resources and Evaluation Workshop Programme, pp. 76.
Lippmann, R. P. (1989). Review of neural networks for speech recognition. Neural Computation, 1(1), 1–38.
Mittal, S. (2014). Development of phonetic engine for Punjabi language. Masters dissertation, Thapar University Patiala.
Mittal, T., & Sharma, R. K. (2016). Speech recognition using ANN and predator-influenced civilized swarm optimization algorithm. Turkish Journal of Electrical Engineering & Computer Sciences, 24(6), 4790–4803.
Povey, D., Burget, L., Agarwal, M., Akyazi, P., Feng, K., Ghoshal, A., & Schwarz, P. (2010). Subspace gaussian mixture models for speech recognition. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4330–4333.
Psutka, J., Müller, L., & Psutka, J. V. (2001). Comparison of MFCC and PLP parameterizations in the speaker independent continuous speech recognition task. 7th European Conference on Speech Communication and Technology (EUROSPEECH 2001), Aalborg, Denmark, pp. 1813–1816.
Punjabi Speech Corpus. Retrieved at 10:30, August 20, 2015, from http://cdac.in/index.aspx?id=mc_ilf_Speech_Corpora.
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.