Automatic speech recognition for under-resourced languages: A survey

Speech Communication - Tập 56 - Trang 85-100 - 2014
Laurent Besacier1, Etienne Barnard2, Alexey Karpov3, Tanja Schultz4
1Laboratory of Informatics of Grenoble, Grenoble, France
2North-West University, Vanderbijlpark, South Africa
3St.Petersburg institute for informatics and automation of the Russian academy of sciences, St.Petersburg, Russia
4Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

Tài liệu tham khảo

Berment, V., 2004. Méthodes pour informatiser des langues et des groupes de langues peu dotées. Ph.D. Thesis, J. Fourier University – Grenoble I, May 2004.

Besacier, L., Zhou, B., Gao, Y., 2006. Towards speech translation of non written languages. In: IEEE/ACL SLT 2006. Aruba, December 2006.

Charniak, E., Knight, K., Yamada, K., 2003. Syntax-based language models for machine translation. In: Proc. IX MT Summit, New Orleans, USA, pp. 40–46.

Constantinescu, A., Chollet, G., 1997. On cross-language experiments and data-driven units for ALISP. In: Proc. Automatic Speech Recognition and Understanding (ASRU), St. Barbara CA, pp. 606–613.

Creutz, M., Lagus, K., 2005. Unsupervised morpheme segmentation and morphology induction from text corpora using Morfessor 1.0. Computer and Information Science, Report A81, Helsinki University of Technology, Finland.

Creutz, 2007, Morph-based speech recognition and modeling of out-of-vocabulary words across languages, ACM Transactions on Speech and Language Processing, 5, 10.1145/1322391.1322394

Cucu, H., Besacier, L., Burileanu, C., Buzo, A., 2012. ASR domain adaptation methods for low-resourced languages: application to Romanian language. In: EUSIPCO’2012, Bucarest, Romania.

Denoual, E., Lepage, Y., 2006. The character as an appropriate unit of processing for non-segmenting languages. In: NLP Annual Meeting, Tokyo, Japan, pp. 731–734.

Do, T., Besacier, L., Castelli, E., 2010. Unsupervised SMT for a low-resourced language pair. In: Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), Penang, Malaysia.

Dugast, C., Aubert, X., Kneser, R., 1995. The Philips large-vocabulary recognition system for American English, French, and German. In: Proc. Eurospeech, Madrid, pp. 197–200.

Ekpenyong, M., Urua, E.-A., Watts, O., King, S., Yamagishi, J., 2013. Statistical parametric speech synthesis for Ibibio, Speech Communication. http://dx.doi.org/10.1016/j.specom.2013.02.003.

Gelas, H., Besacier, L., Rossato, S., Pellegrino, F., 2010. Using automatic speech recognition for phonological purposes: study of vowel length in Punu (Bantu B40). In: Laphon 12, New Mexico (US), July 2010.

Glass, 1995, Multi-lingual spoken language understanding in the MIT voyager system, Speech Communication, 17, 1, 10.1016/0167-6393(95)00008-C

Gokcen, S., Gokcen, J., 1997. A multilingual phoneme and model set: towards a universal base for automatic speech recognition. In: Proc. Automatic Speech Recognition and Understanding (ASRU), St. Barbara CA, pp. 599–603.

1999

Kanejiya, D.P., Kumar, A., Prasad, S., 2003. Statistical language modeling using syntactically enhanced LSA. In: Proc. TIFR Workshop on Spoken Language Processing, Mumbai, India, pp. 93–100.

Krauwer, S., 2003. The basic language resource kit (BLARK) as the first milestone for the language resources roadmap. In: Proceedings of the 2003 International Workshop Speech and Computer SPECOM-2003, Moscow, Russia, pp. 8–15.

Lee, 2009, Probabilistic modeling of Korean morphology, IEEE Transactions on Audio, Speech & Language Processing, 17, 945, 10.1109/TASL.2009.2019922

Mohamed, 2012, Acoustic modeling using deep belief networks, IEEE Transactions on Audio, Speech, and Language Processing, 20, 14, 10.1109/TASL.2011.2109382

The US NIST 2009 (RT-09) Rich Transcription Meeting Recognition Evaluation Plan, 2009.

Patel, 2009, A comparative study of speech and dialed input voice interfaces in rural India, 51

Patel, 2010, Avaaj Otalo: a field study of an interactive voice forum for small farmers in rural India, 733

Pellegrini, 2009, Automatic word decompounding for ASR in a morphologically rich language: application to Amharic, IEEE Transactions on Audio, Speech & Language Processing, 17, 863, 10.1109/TASL.2009.2022295

Roux, J.C., Botha, E.C., du Preez, J.A., 2000. Developing a multilingual telephone based information retrieval system in African languages. In: Proceedings of the Second International Conference on Language Resources and Evaluation, pp. 975–980.

Schultz, T., Waibel, A., 1998. Language independent and language adaptive LVCSR. In: Proc. ICSLP, Sydney, pp. 1819–1822.

Schultz, 2001, Language independent and language adaptive acoustic modeling for speech recognition, Speech Communication, 35, 31, 10.1016/S0167-6393(00)00094-7

Stolcke, A., Grezl, F., Hwang, M.-Y., Lei, X., Morgan, N., Vergyri, D., 2006. Cross-domain and cross-lingual portability of acoustic features estimated by multilayer perceptrons. In: Proc. ICASSP 2006.

Stüker, S., 2008. Integrating Thai grapheme based acoustic models into the ML-mix framework – for language independent and cross-language ASR. In: SLTU’08, Hanoi, Vietnam.

van Niekerk, D.R., Barnard, E., 2013. Predicting utterance pitch targets in Yoruba for tone realisation in speech synthesis, Speech Communication. http://dx.doi.org/10.1016/j.specom.2013.01.009.

Vu, N.T., Metze, F., Schultz, T., 2012a. Multilingual bottle-neck feature for under resourced languages. In: Proc. SLTU, South Africa.

Whittaker, E.W.D., 2000. Statistical language modelling for automatic speech recognition of Russian and English. Ph.D. thesis, Cambridge Univ., p. 140.

Wissing, 2008, Vowel variations in Southern Sotho: an acoustical investigation, Southern African Linguistics and Applied Language Studies, 26, 255, 10.2989/SALALS.2008.26.2.6.570