Russian voice interface

Pattern Recognition and Image Analysis - Tập 17 - Trang 321-336 - 2007
A. L. Ronzhin1, A. A. Karpov1
1St. Petersburg Institute for Informatics and Automation, Russian Academy of Sciences, St. Petersburg, Russia

Tóm tắt

In the paper, we describe a system SIRIUS for recognition of continuous Russian speech, which is developed in the group of speech informatics of SPIIRAS. The specific feature of this system is that the language and speech are represented on morphemic level. This allows one to significantly reduce the size of lexically recognizable dictionary and increase the processing rate. We describe the process of introduction of the Russian speech recognition system into the area of infotelecommunications for voice access to the Internet-version of the electronic catalogue “Yellow Pages of Saint Petersburg” with the purpose of creation of an automated call-center for answering subscriber’s calls. In the paper, we demonstrate the results of testing the system work with speech samples recorded both in offices and in conditions of phone conversations.

Tài liệu tham khảo

L. R. Rabiner, “Applications of Speech Recognition in the Area of Telecommunications,” in 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings (1997), pp. 501–510. A. L. Ronzhin and A. A. Karpov, “Implementation of Morphemic Analysis for Russian Speech Recognition,” in Proc. of 9th Int. Conf. SPECOM’2004, Russia, 2004 (Anatolya, St. Petersburg, 2004), pp. 291–296. A. A. Zaliznyak, Grammatical Dictionary of the Russian language (Moscow, 1977) [in Russian]. R. V. Cox, C. A. Kamm, L. R. Rabiner, J. Schroeter, and J. G. Wilpon, “Speech and Language Processing for Next-Millennium Communications Services,” Proc. of the IEEE 88(8), 1314–1337 (2000). S. V. Krest’yaninov, Intelligent Networks and Computer-Integrated Telephony (Radio i Svyaz’, Moscow, 2001) [in Russian]. C. Wood, K. Torkkola, and S. Kundalkar, “Using Driver’s Speech to Detect Cognitive Workload,” in Proc. of 9th Int. Conf. SPECOM’2004, Russia, 2004 (Anatolya, St. Petersburg, 2004), pp. 215–222. N. O. Bernsen, H. Dybkjæar, and L. Dybkjæar, Designing Interactive Speech Systems: From First Ideas to User Testing (Springer, 1998). J. Hirasawa, N. Miyzaki, M. Nakano, and K. Aikawa, “New Feature Parameters for Detecting Misunderstanding in Spoken Dialog System,” in Proc. of ICSLP’2000, Beijing, China (2000). A. Kurematsu, Y. Akegam, S. Burge, S. Jekat, B. Lause, V. Maclaren, D. Oppermann, and T. Schultz, “VERBMOBIL Dialogues: Multifaced Analysis,” in: Proc. of ICSLP’2000, Beijing, China (2000). O. Pietquin, A Framework for Unsupervised Learning of Dialogue Strategies (Presses universitaires de Louvain, Belgium, 2004). T. I. Ivanova, Computer Technologies in Telephony (Eko-Trendz, Moscow, 2002) [in Russian]. J. Greenberg, “A Quantitative Approach to the Morphological Typology of Language,” Int. J. of Amer. Linguistics 26(3), 64 (July, 1960). A. I. Kuznetsova and T. F. Efremova, Dictionary of Morphemes of the Russian language (Russkii Yazyk, Moscow, 1986) [in Russian]. Russian Grammar (Nauka, Moscow, 1980) [in Russian]. Library of Maksim Moshkov: http://lib.ru S. Young et al., The HTK Book (v3.0) (Cambridge University, Engineering Department, 2000). A. A. Karpov, “Robust Method for Determination of Speech Boundaries on the Basis of Spectral Entropy,” Iskusstvennyi intellekt, No. 4, 607–613 (2004) (Donetsk). Yu. A. Kosarev, I. V. Li, A. L. Ronzhin, and J. Savage, “Methods for Speech and Text Understanding,” in: Trudy SPIIRAN Ed. by R. M. Yusupov (Anatoliya, St. Petersburg, 2004), issue 1, Vol. 2, pp. 157–195. I. V. Lee, A. L. Ronzhin, and A. A. Karpov, “Semantic-Pragmatic Processing of Natural Language for Automatic Speech Understanding System,” in Proc. of 9th Int. Conf. SPECOM’2004, Russia, 2004 (Anatolya, St. Petersburg, 2004), pp. 488–494. A. L. Ronzhin and A. A. Karpov, “Russian Voice Interface,” in: Proc. 7th Int. Conf. on Pattern Recognition and Image Analysis: New Informational Technologies ROAI-7-2004, pp. 523–526. Yellow Pages of St. Petersburg: http://yell.ru K. Markov, T. Matsui, R. Gruhn, J. Zhang, and S. Nakamura, “Noise and Channel Distortion Robust ASR System for DARPA SPINE2 Task,” IEICE Transactions on Information and Systems E86-D (3), 497–504 (2003). K. Yao, K. Paliwal, and S. Nakamura, “Noise Adaptive Speech Recognition Based on Sequential Noise Parameter Estimation,” Speech Communication 42, 5–23 (2004). B. H. Juang, “Speech Recognition in Adverse Environments,” Computer Speech and Language, 275–294 (1991). C. H. Lee, C. H. Lin, and B. H. Juang, “A Study on Speaker Adaptation of the Parameters of Continuous Density Hidden Markov Models,” IEEE Trans. Acoust., Speech, Signal Processing ASSP-39 (4), 806–814 (1991). M. Westphal, “The Use of Cepstral Means in Conversational Speech Recognition,” in Proc. European Conf. on Speech Communication and Technology (Rhodes, 1997), Vol. 3, pp. 1143–1146. M. Pawlewski and S. Downey, “Channel Effects in Speaker Recognition,” in: Proc. of the COST-250 Workshop on Application of Speaker Recognition Techniques in Telephony (1996), pp. 39–46.