Front end analysis of speech recognition: a review

International Journal of Speech Technology - Tập 14 - Trang 99-145 - 2011
M. A. Anusuya1, S. K. Katti1
1Department of Computer Science, SJCE, Mysore, India

Tóm tắt

Automatic speech recognition (ASR) has made great strides with the development of digital signal processing hardware and software. But despite of all these advances, machines can not match the performance of their human counterparts in terms of accuracy and speed, especially in case of speaker independent speech recognition. So, today significant portion of speech recognition research is focused on speaker independent speech recognition problem. Before recognition, speech processing has to be carried out to get a feature vectors of the signal. So, front end analysis plays a important role. The reasons are its wide range of applications, and limitations of available techniques of speech recognition. So, in this report we briefly discuss the different aspects of front end analysis of speech recognition including sound characteristics, feature extraction techniques, spectral representations of the speech signal etc. We have also discussed the various advantages and disadvantages of each feature extraction technique, along with the suitability of each method to particular application.

Tài liệu tham khảo

Allen, J. B. (1985). Cochlear modeling. IEEE ASSP Magazine, 3(3), 3–29.

Brand, M. (2002). Charting a manifold. In Advances in neural information processing systems (Vol. 15, pp. 985–992). Cambridge: MIT Press.

Brand, M. (2004). From subspaces to submanifolds. In Proc. of the 15th British machine vision conference, London, UK.

Chang, K.-Y., & Ghosh, J. (1998). Principal curves for nonlinear feature extraction and classification. In Applications of artificial neural networks in image processing III (pp. 120–129). Bellingham: SPIE.

Faloutsos, C., & Lin, K.-I. (1995). FastMap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In Proc. of the 1995 ACM international conference on management of data (pp. 163–174).

Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188.

Hamming, R. W. (1989). Digital filters (2nd ed.). Englewood Cliffs: Prentice-Hall.

Huang, N. E. (2005). Introduction to the Hilbert Huang transform and its related mathematical problems.

Huang, N. E., Shen, Z., & Long, R. S. (1999). A new view of nonlinear water waves—the Hilbert spectrum. Annual Review of Fluid Mechanics, 31, 417–457.

Møller, A. R. (1983). Auditory physiology. New York: Academic Press.

Pallet, D. S. (1989). Speech results on resource management task. In Proceedings of the February 1989 DARPA speech and natural language workshop (pp. 18–24). Philadelphia: Morgan Kaufman.

Picone, J. (1983). Analytic signal processing. Ph.D. Dissertation, Illinois Institute of Technology, Chicago, Illinois, USA, December.

Proakis, J. G. (1989). Digital communications (2nd ed.). New York: McGraw-Hill.

Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Englewood Cliffs: Prentice-Hall.

Scheirer, E., & Slaney, M. Construction and evaluation of a robust multi feature speech/music discriminator. Interval Research Corp, 1801-C Page Mill Road, Pal Alto, CA, 94304, USA.

Sukkar, R. S., LoCicero, J. L., & Picone, J. (1988). Design and implementation of a parallel processing based pitch detector. IEEE Journal on Selected Areas in Communications, 6(2), 441–451.

Teh, Y. W., & Roweis, S. T. (2002). Automatic alignment of hidden representations. In Advances in neural information processing systems (Vol. 15, pp. 841–848). Cambridge: MIT Press.