Large vocabulary Russian speech recognition using syntactico-statistical language modeling

Speech Communication - Tập 56 - Trang 213-228 - 2014
Alexey Karpov1, Konstantin Markov2, Irina Kipyatkova1, Daria Vazhenina2, Andrey Ronzhin1
1St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences (SPIIRAS), St. Petersburg, Russia
2Human Interface Laboratory, The University of Aizu, Fukushima, Japan

Tài liệu tham khảo

Anisimovich, K., Druzhkin, K., Minlos, F., Petrova, M., Selegey, V., Zuev, K., 2012. Syntactic and semantic parser based on ABBYY Compreno linguistic technologies. In: Proc. Dialogue-2012, Moscow, Russia, vol. 2, pp. 91–103. Antonova, A., Misyurev, A., 2012. Russian dependency parser SyntAutom at the Dialogue-2012 parser evaluation task. In: Proc. Int. Conf. Dialogue-2012, Moscow, Russia, vol. 2, pp. 104–118. Arisoy, E., Saraclar, M., Roark, B., Shafran, I., 2010. Syntactic and sub-lexical features for Turkish discriminative language models. In: Proc. Int. Conf. ICASSP’2010, Dallas, USA, pp. 5538–5541. Arlazarov, V., Bogdanov, D., Krivnova, O., Podrabinovich, A., 2004. Creation of Russian speech databases: design, processing, development tools. In: Proc. Int. Conf. SPECOM’2004, St. Petersburg, Russia, pp. 650–656. Bechet, F., Nasr, A., 2009. Robust dependency parsing for spoken language understanding of spontaneous speech. In: Proc. Interspeech’2009, Brighton, UK, pp. 1039–1042. Bellegarda, 2004, Statistical language model adaptation: review and perspectives, Speech Commun., 42, 93, 10.1016/j.specom.2003.08.002 Bhanuprasad, K., Svenson, M., 2008. Errgrams – a way to improving ASR for highly inflective Dravidian languages. In: Proc. 3rd Int. Joint Conf. on Natural Language Processing IJCNLP’2008, India, pp. 805–810. Chelba, 2000, Structured language model, Comput. Speech Lang., 10, 283, 10.1006/csla.2000.0147 Cubberley, 2002 Deoras, A., Sarikaya, R., Tur, G., Hakkani-Tur, D., 2012. Joint decoding for speech recognition and semantic tagging. In: Proc. Interspeech’2012, Portland, Oregon, USA. Huet, 2010, Morpho-syntactic postprocessing of N-best lists for improved French automatic speech recognition, Comput. Speech Lang., 24, 663, 10.1016/j.csl.2009.10.001 Iomdin, L., Petrochenkov, V., Sizov, V., Tsinman, L., 2012. ETAP parser: state of the art. In: Proc. Dialogue-2012, Moscow, Russia, vol. 2, pp. 119–131. Ircing, P., Hoidekr, J., Psutka, J., 2006. Exploiting linguistic knowledge in language modeling of Czech spontaneous speech. In: Proc. Int. Conf. on Language Resources and Evaluation LREC’2006, Genoa, Italy, pp. 2600–2603. Jokisch, O., Wagner, A., Sabo, R., Jaeckel, R., Cylwik, N., Rusko, M., Ronzhin, A., Hoffmann, R., 2009. Multilingual speech data collection for the assessment of pronunciation and prosody in a language learning system. In: Proc. SPECOM’2009, St. Petersburg, Russia, pp. 515–520. Kanejiya, D.P., Kumar, A., Prasad, S., 2003. Statistical language modeling using syntactically enhanced LSA. In: Proc. TIFR Workshop on Spoken Language Processing, Mumbai, India, pp. 93–100. Kanevsky, D., Monkowski, M., Sedivy, J., 1996. Large vocabulary speaker-independent continuous speech recognition in Russian language. In: Proc. SPECOM’1996, St. Petersburg, Russia, pp. 117–121. Karpov, A., Kipyatkova, I., Ronzhin, A., 2011. Very large vocabulary ASR for spoken Russian with syntactic and morphemic analysis. In: Proc. Interspeech’2011, Florence, Italy, pp. 3161–3164. Karpov, A., Kipyatkova, I., Ronzhin, A., 2012. Speech recognition for East Slavic languages: the case of Russian. In: Proc. 3rd International Workshop on Spoken Languages Technologies for Under-resourced Languages SLTU’2012, Cape Town, RSA, 2012, pp. 84–89. Kipyatkova, I., Karpov, A., Verkhodanova, V., Zelezny, M., 2012. Analysis of long-distance word dependencies and pronunciation variability at conversational Russian speech recognition, In: Proc. Federated Conference on Computer Science and Information Systems FedCSIS-2012, Wroclaw, Poland, pp. 719–725. Kouznetsov, V., Chuchupal, V., Makovkin, K. Chichagov, A., 1999. Design and implementation of a Russian telephone speech database. In: Proc. SPECOM’1999, Moscow, Russia, pp. 179–181. Kuo, H.-K.J., Mangu, L., Emami, A., Zitouni, I., Lee, Y.-S., 2009. Syntactic features for Arabic speech recognition. In: Proc. International Workshop ASRU’2009, Merano, Italy, pp. 327–332. Kurimo, M. et al., 2006. Unlimited vocabulary speech recognition for agglutinative languages. In: Proc. Human Language Technology Conference of the North American Chapter of the ACL, New York, USA, pp. 487–494. Lamel, L. et al., 2011. Speech recognition for machine translation in Quaero. In: Proc. International Workshop on Spoken Language Translation IWSLT’2011, San Francisco, USA, pp. 121–128. Lamel, 2012, Transcription of Russian conversational speech, 156 Lee, A., Kawahara, T., Recent development of open-source speech recognition engine julius. In: Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2009), Sapporo, Japan, pp. 131–137. Leontyeva, A., Kagirov, I., 2008. The module of morphological and syntactic analysis SMART. In: Proc. Int. Conf. on Text, Speech and Dialogue TSD’2008, LNAI 5246, Brno, Czech Republic, pp. 373–380. Moore, G.L., 2001. Adaptive Statistical Class-based Language Modelling. PhD thesis, Cambridge University. Nozhov, I., 2003. Realization of automatic syntactic segmentation of a Russian sentence. PhD thesis, p. 140 (in Russian). http://www.aot.ru/docs/Nozhov/msot.pdf. Odell, J., 1995. The use of context in large vocabulary speech recognition, PhD thesis, Cambridge Univ. Oparin, I., Talanov, A., 2005. Stem-based approach to pronunciation vocabulary construction and language modeling for Russian. In: Proc. SPECOM’2005, Patras, Greece, pp. 575–578. Oparin, I., Glembek, O., Burget, L., Cernocky, J., 2008. Morphological random forests for language modeling of inflectional languages. In: Proc. IEEE Spoken Language Technology Workshop SLT’2008, Goa, India, pp. 189–192. Padgett, 2005, Adaptive dispersion theory and phonological vowel reduction in Russian, Phonetica, 62, 14, 10.1159/000087223 Potapova, R., 2011. Multilingual spoken language databases in Russia. In: Proc. International Conference Speech and Computer SPECOM’2011, Kazan, Russia, pp. 13–17. Psutka, J., Ircing, P., Psutka, J.V., Hajic, J., Byrne, W.J., Mirovsky, J., 2005. Automatic transcription of Czech, Russian, and Slovak spontaneous speech in the MALACH project. In: Proc. Interspeech’2005, Lisbon, Portugal, pp. 1349–1352. Pylypenko, V., 2007. Extra large vocabulary continuous speech recognition algorithm based on information retrieval. In: Proc. Interspeech’2007, Antwerp, Belgium, pp. 1809–1812. Rastrow, A., Dredze, M., Khudanpur, S., 2012. Fast syntactic analysis for statistical language modeling via substructure sharing and uptraining. In: Proc. 50th Annual Meeting of Association for Computational Linguistics ACL’2012, Jeju, Korea, pp. 175–183. Roark, B., 2002. Markov parsing: lattice rescoring with a statistical parser. In: Proc. 40th Annual Meeting of the Association for Computational Linguistics ACL’2002, Philadelphia, USA, pp. 287–294. Ronzhin, A., Karpov, A., 2004. Automatic system for Russian speech recognition SIRIUS. In: Proc. SPECOM’2004, St. Petersburg, Russia, pp. 291–296. Schalkwyk, J., Beeferman, D., Beaufays, F., Byrne, B., Chelba, C., Cohen, M., Kamvar, M., Strope, B., 2010. Google search by voice: a case study. In: Advances in Speech Recognition: Mobile Environments, Call Centers and Clinics, pp. 61–90. Schultz, T., Waibel, A., 1998. Development of multilingual acoustic models in the GlobalPhone project. In: Proc. TSD’1998, Brno, Czech Republic, pp. 311–316. Shirokova, A., 2007. STEL speech database for speaker recognition and multispeaker segmentation. In: Proc. SPECOM’2007, Moscow, Russia, pp. 877–881. Shvedova, N. et al., 1980. Russian Grammar, vol. 1, Moscow, p. 783 (in Russian). Sidorov, 2012, Syntactic dependency-based n-grams as classification features, 7630, 1 Singh, 2002, Automatic generation of subword units for speech recognition systems, IEEE Trans. Acoust. Speech Signal Process., 10, 89, 10.1109/89.985546 Skatov, D., Okat’ev, V., Patanova, T., Erekhinskaya, T., 2012. Dictascope Syntax: the Natural Language Syntax Parser, http://dialog-21.ru/digests/dialog2012/materials/pdf/Скатов.pdf. Skrelin, P., Volskaya, N., Kocharov, D., Evgrafova, K., Glotova, O., Evdokimova, V., 2010. CORPRES – Corpus of Russian professionally read speech. In: Proc. TSD’2010, Brno, Czech Republic, pp. 392–399. Smirnova, J., 2011. Compound systems of pretonic vocalism after palatalized consonants in Russian dialects: a synchronic and diachronic analysis. In: Proc. 17th Int. Cong. of Phonetic Sciences ICPhS’2011, Hong Kong, pp. 1870–1873. Sokirko, A., 2004. Morphological modules on the website www.aot.ru. In: Proc. Dialogue-2004, Protvino, Russia, pp. 559–564 (in Russian). Starostin, A., Mal’kovskiy, M., 2007. Algorithm of syntax analysis employed by the “Treeton” morpho-syntactic analysis system. In: Proc Int. Conf. “Dialogue-2007, Moscow, Russia, pp. 516–524 (in Russian). Stolcke, A., Zheng, J., Wang, W., Abrash, V., 2011. SRILM at sixteen: update and outlook. In: Proc. IEEE Automatic Speech Recognition and Understanding Workshop ASRU’2011, Waikoloa, Hawaii, USA. Stüker, S., 2008. Modified polyphone decision tree specialization for porting multilingual grapheme based ASR systems to new languages. In: Proc. ICASSP’2008, Las Vegas, Nevada, USA, pp. 4249–4252. Stüker, S., Schultz, T., 2004. A grapheme based speech recognition system for Russian. In: Proc. Int. Conf. SPECOM’2004, St. Petersburg, Russia, pp. 297–303. Szarvas, M., Furui, S., 2003. Finite-state transducer based modeling of morphosyntax with applications to Hungarian LVCSR. In: Proc. ICASSP’2003, Hong Kong, China, pp. 368–371. Tatarnikova, M., Tampel, I., Oparin, I., Khokhlov, Y., 2006. Building acoustic models for a large vocabulary continuous speech recognizer for Russian. In: Proc. SPECOM’2006, St. Petersburg, Russia, pp. 83–87. Vaiciunas, A., 2006. Statistical Language Models of Lithuanian and Their Application to Very Large Vocabulary Speech Recognition. PhD thesis, Vytautas Magnus University, Kaunas. Vazhenina, D., Markov, K., 2011. Phoneme set selection for Russian speech recognition. In: Proc. Int. Conf. on Natural Language Processing and Knowledge Engineering NLP-KE, Tokushima, Japan, pp. 475–478. Vazhenina, D., Kipyatkova, I., Markov, K., Karpov, A., 2012. State-of-the-art speech recognition technologies for Russian language. In: Proc. Joint Int. Conf. on Human-Centered Computer Environments HCCE’2012, ACM, Aizu, Japan, pp. 59–63. Viktorov, 2009, Universal technique for preparing components for training of a speech recognition system, Speech Technol., 2, 39 Vintsyuk, 1968, Speech discrimination by dynamic programming, Kibernetica, 1, 15 Whittaker, E.W.D., 2000. Statistical language modelling for automatic speech recognition of Russian and English, PhD thesis, Cambridge Univ., p. 140. Whittaker, E.W.D., Woodland, P.C., 2001. Efficient class-based language modelling for very large vocabularies. In: Proc. ICASSP’2001, Salt Lake City, USA, pp. 545–548. Young, S., Odell, J., Woodland, P., 1994. Tree-based state tying for high accuracy acoustic modelling. In: Proc. Int. Workshop on Human Language Technology HLT’1994, Stroudsburg, PA, USA, pp. 307–312. Young, 2009 Zablotskiy, S., Shvets, A., Sidorov, M., Semenkin, E., Minker, W., 2012. Speech and language recources for LVCSR of Russian. In: Proc. LREC’2012, Istanbul, Turkey, pp. 3374–3377. Zaliznjak, 2003 Zhang, 2008, Using mutual information criterion to design an efficient phoneme set for Chinese speech recognition, IEICE Trans. Inform. Syst., E91-D, 508, 10.1093/ietisy/e91-d.3.508