Automatic speech recognition for under-resourced languages: A survey

Speech Communication - Tập 56 - Trang 85-100 - 2014

Laurent Besacier¹, Etienne Barnard², Alexey Karpov³, Tanja Schultz⁴

¹Laboratory of Informatics of Grenoble, Grenoble, France

²North-West University, Vanderbijlpark, South Africa

³St.Petersburg institute for informatics and automation of the Russian academy of sciences, St.Petersburg, Russia

⁴Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

Tài liệu tham khảo

Abdillahi, N., Nocera, P., Bonastre, J.-F., 2006. Automatic transcription of Somali language. In: ICSLP’06, Pittsburgh, PA, USA, pp. 289–292.

Ablimit, M., Neubig, G., Mimura, M., Mori, S., Kawahara, T., Hamdulla, A., 2010. Uyghur Morpheme-based language models and ASR. In: Proc. IEEE 10th International Conference on Signal Processing (ICSP), Beijing, China, pp. 581–584.

Adda-Decker, M., 2003. A corpus-based decompounding algorithm for German lexical modeling in LVCSR. In: Proc. Eurospeech-2003, Geneva, Switzerland, pp. 257–260.

Arisoy, 2006, A unified language model for large vocabulary continuous speech recognition of Turkish, Signal Processing, 86, 2844, 10.1016/j.sigpro.2005.12.002

Arisoy, E., Sainath, T.N., Kingsbury, B., Ramabhadran, B., 2012. Deep neural network language models. In: Proc. NAACL-HLT 2012 Workshop, Montreal, Canada, pp. 20–28.

Barnard, E., Davel, M., van Heerden, C., 2009. ASR corpus design for resource-scarce languages. In: Proc. Interspeech, pp. 2847–2850.

Barnard, E., Davel, M., van Huyssteen, G.B., 2010. Speech technology for information access: a South African case study. In: Proceedings of the AAAI Spring Symposium on Artificial Intelligence for Development (AI-D), Palo Alto, California, March 2010, pp. 8–13.

Barnett, J., Corrada, A., Gao, G., Gillik, L., Ito, Y., Lowe, S., Manganaro, L., Peskin, B., 1996. Multilingual speech recognition at Dragon systems. In: Proc. ICSLP, Philadelphia, pp. 2191–2194.

Berment, V., 2004. Méthodes pour informatiser des langues et des groupes de langues peu dotées. Ph.D. Thesis, J. Fourier University – Grenoble I, May 2004.

Besacier, L., Zhou, B., Gao, Y., 2006. Towards speech translation of non written languages. In: IEEE/ACL SLT 2006. Aruba, December 2006.

Bhanuprasad, K., Svenson, M., 2008. Errgrams – a way to improving ASR for highly inflective Dravidian languages. In: Proc. 3rd International Joint Conf. on Natural Language Processing IJCNLP’08, India, pp. 805–810.

Billa, J., Ma, K., McDonough, J., Zavaliagkos, G., Miller, D.R., Ross, K.N., El-Jaroudi, A., 1997. Multilingual speech recognition: the 1996 Byblos Callhome system. In: Proc. Eurospeech-1997, Rhodes, Greece, pp. 363–366.

Cai, J., 2008. Transcribing southern min speech corpora with a web-based language learning system. In: SLTU’08, Hanoi, Vietnam.

Carki, K., Geutner, P., Schultz, T., 2000. Turkish LVCSR: towards better speech recognition for agglutinative languages. In: IEEE ICASSP.

Cetin, O., 2008. Unsupervised adaptive speech technology for limited resource languages: a case study for Tamil. In: SLTU’08, Hanoi, Vietnam.

Chan, H.Y., Rosenfeld, R. 2012. Discriminative pronunciation learning for speech recognition for resource scarce languages. In: Proceedings of the 2nd ACM Symposium on Computing for Development. Article No. 12.

Charniak, E., Knight, K., Yamada, K., 2003. Syntax-based language models for machine translation. In: Proc. IX MT Summit, New Orleans, USA, pp. 40–46.

Charoenpornsawat, P., Hewavitharana, S., Schultz, T., 2006. Thai grapheme-based speech recognition. In: Human Language Technology Conference (HLT).

Chelba, 2000, Structured language model, Computer Speech and Language, 10, 283, 10.1006/csla.2000.0147

Cohen, P., Dharanipragada, S., Gros, J., Monkowski, M., Neti, C., Roukos, S., Ward, T., 1997. Towards a universal speech recognizer for multiple languages. In: Proc. Automatic Speech Recognition and Understanding (ASRU), St. Barbara CA, pp. 591–598.

Constantinescu, A., Chollet, G., 1997. On cross-language experiments and data-driven units for ALISP. In: Proc. Automatic Speech Recognition and Understanding (ASRU), St. Barbara CA, pp. 606–613.

Creutz, M., Lagus, K., 2005. Unsupervised morpheme segmentation and morphology induction from text corpora using Morfessor 1.0. Computer and Information Science, Report A81, Helsinki University of Technology, Finland.

Creutz, 2007, Morph-based speech recognition and modeling of out-of-vocabulary words across languages, ACM Transactions on Speech and Language Processing, 5, 10.1145/1322391.1322394

Crystal, D., 2000. Language Death. Cambridge CUP.

Cucu, H., Besacier, L., Burileanu, C., Buzo, A., 2011. Investigating the role of machine translated text in ASR domain adaptation: unsupervised and semi-supervised methods. In: Proc. ASRU 2011, Hawaii, USA.

Cucu, H., Besacier, L., Burileanu, C., Buzo, A., 2012. ASR domain adaptation methods for low-resourced languages: application to Romanian language. In: EUSIPCO’2012, Bucarest, Romania.

Cucu, H., Buzo, A., Besacier, L., Burileanu, C., 2013. SMT-based ASR domain adaptation methods for under- resourced languages: application to Romanian. Speech Communication. http://dx.doi.org/10.1016/j.specom.2013.05.003.

Davel, M.H., van Heerden, C., Kleynhans, N., Barnard, E., 2011. Efficient harvesting of Internet audio for resource-scarce ASR. In: Proc. Interspeech, pp. 3153–3156.

De Vries, N.J., Badenhorst, J., Davel, M.H., Barnard, E., De Waal, A., 2011. Woefzela-an open-source platform for ASR data collection in the developing world. In: Proc. Interspeech, pp. 3177–3180.

De Vries, N.J., Davel, M.H., Badenhorst, J., Basson, W.D., de Wet, F., Barnard, E., De Waal, A., 2013. A smartphone-based ASR data collection tool for under-resourced languages, Speech Communication. http://dx.doi.org/10.1016/j.specom.2013.07.001.

Denoual, E., Lepage, Y., 2006. The character as an appropriate unit of processing for non-segmenting languages. In: NLP Annual Meeting, Tokyo, Japan, pp. 731–734.

Do, T., Besacier, L., Castelli, E., 2010. Unsupervised SMT for a low-resourced language pair. In: Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), Penang, Malaysia.

Dugast, C., Aubert, X., Kneser, R., 1995. The Philips large-vocabulary recognition system for American English, French, and German. In: Proc. Eurospeech, Madrid, pp. 197–200.

Ekpenyong, M., Urua, E.-A., Watts, O., King, S., Yamagishi, J., 2013. Statistical parametric speech synthesis for Ibibio, Speech Communication. http://dx.doi.org/10.1016/j.specom.2013.02.003.

Ganapathiraju, A., Hamaker, J., Picone, J., 2000. Hybrid SVM/HMM architectures for speech recognition. In: Proceedings of Speech Transcription Workshop, pp. 504–507.

Gebreegziabher, M., Besacier, L., 2012. English-Amharic statistical machine translation. In: SLTU – Workshop on Spoken Language Technologies for Under-Resourced Languages, Cape-Town, South Africa.

Gelas, H., Besacier, L., Rossato, S., Pellegrino, F., 2010. Using automatic speech recognition for phonological purposes: study of vowel length in Punu (Bantu B40). In: Laphon 12, New Mexico (US), July 2010.

Gelas, H., Teferra Abate, S., Besacier, L., Pellegrino, F., 2011. Quality assessment of crowdsourcing transcriptions for African languages. In: Interspeech 2011 Florence, Italy, 28–31 August 2011.

Gemmeke, J.F., Van hamme, H., 2011. A hierarchical exemplar-based sparse model of speech with an application to ASR. IEEE ASRU 2011, HI, USA.

Ghoshal, A., Jansche, M., Khudanpur, S., Riley, M., Ulinski, M., 2009. Web-derived pronunciations. In: IEEE ICASSP.

Gizaw, S., 2008. Multiple pronunciation model for Amharic speech recognition system. In: SLTU 2008, Hanoi, Vietnam.

Glass, 1995, Multi-lingual spoken language understanding in the MIT voyager system, Speech Communication, 17, 1, 10.1016/0167-6393(95)00008-C

Godfrey, J.J., Holliman, E.C., McDaniel, J., 1992. SWITCHBOARD: telephone speech corpus for research and development. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 517–520.

Gokcen, S., Gokcen, J., 1997. A multilingual phoneme and model set: towards a universal base for automatic speech recognition. In: Proc. Automatic Speech Recognition and Understanding (ASRU), St. Barbara CA, pp. 599–603.

Grezl, F., et al., 2007. Probabilistic and bottle-neck features for LVCSR of meetings. In: Proc. ICASSP, USA.

Hermansky, H., Wellis, D., Sharma, S., 2000. Tandem connectionist feature extraction for conventional HMM systems. In: Proc. ICASSP, Turkey.

Huang, C., Chang, E., Zhou, J., Lee K.-F., 2000. Accent modeling based on pronunciation dictionary adaptation for large vocabulary Mandarin speech recognition. In: Proc. INTERSPEECH-2000, Beijing, China, pp. 818–821.

Huet, 2010, Morpho-syntactic postprocessing of N-best lists for improved French automatic speech recognition, Computer Speech and Language, 24, 663, 10.1016/j.csl.2009.10.001

Hughes, T., Nakajima, K., Ha, L., Moreno, P., LeBeau, M., 2010. Building transcribed speech corpora quickly and cheaply for many languages. In: Proc. Interspeech, Makuhari, Japan, pp. 1914–1917.

1999

Jensson, A., 2008. Development of a speech recognition system for Icelandic using machine translated text. In: SLTU’08, Hanoi, Vietnam.

Jing, Z., Min, Z., 2010. Speech recognition system based improved DTW algorithm. In: Proc. Int. Conf. on Computer, Mechatronics, Control and, Electronic Engineering CMCE-2010, vol. 5, pp. 320–323.

Kanejiya, D.P., Kumar, A., Prasad, S., 2003. Statistical language modeling using syntactically enhanced LSA. In: Proc. TIFR Workshop on Spoken Language Processing, Mumbai, India, pp. 93–100.

Kanthak, S., Ney, H., 2003. Multilingual acoustic modeling using graphemes. In: Eurospeech-2003, Geneva, Switzerland, pp. 1145–1148.

Karanasou, P., Lamel, L., 2010. Comparing SMT methods for automatic generation of pronunciation variants. In: IceTAL 2010, Reykjavik, Iceland, p. 167.

Karpov, A., Kipyatkova, I., Ronzhin, A., 2011. Very large vocabulary ASR for spoken Russian with syntactic and morphemic analysis. In: Proc. Interspeech’2011, Florence, Italy, pp. 3161–3164.

Karpov, A., Markov, K., Kipyatkova, I., Vazhenina, D., Ronzhin, A., 2013. Large vocabulary Russian speech recognition using syntactico-statistical language modeling. Speech Communication. http://dx.doi.org/10.1016/j.specom.2013.07.004.

Kiecza, D., Schultz, T., Waibel, A., 1999. Data-driven determination of appropriate dictionary units for Korean LVCSR. In: Proceedings of the International Conference on Speech Processing, pp. 323–327.

Killer, M., Stüker, S., Schultz, T., 2003. Grapheme based speech recognition. In: Interspeech.

Kipyatkova, I., Karpov, A., Verkhodanova, V., Zelezny, M., 2012. Analysis of long-distance word dependencies and pronunciation variability at conversational Russian speech recognition. In: Proc. FedCSIS-2012, Wroclav, Poland, pp. 719–725.

Köhler, J., 1998. Language adaptation of multilingual phone models for vocabulary independent speech recognition tasks. In: Proc. ICASSP, Seattle, pp. 417–420.

Krauwer, S., 2003. The basic language resource kit (BLARK) as the first milestone for the language resources roadmap. In: Proceedings of the 2003 International Workshop Speech and Computer SPECOM-2003, Moscow, Russia, pp. 8–15.

Kuo, H.-K.J., Mangu, L., Emami, A., Zitouni, I., Lee, Y.-S., 2009. Syntactic features for Arabic speech recognition. In: Proc. International Workshop ASRU’2009, Merano, Italy, pp. 327–332.

Kurimo, M., Puurula, A., Arisoy, E., Siivola, V., Hirsimaki, T., Pylkkonen, J., Alumae, T., Saraclar, M., 2006. Unlimited vocabulary speech recognition for agglutinative languages. In: Proc. HLT-NAACL, NY, USA.

Kurimo, M., et al., 2006. Unsupervised segmentation of words into morphemes – Morpho Challenge. Application to automatic speech recognition. In: Proc. Interspeech’06, Pittsburgh, PA, USA, pp. 1021–1024.

Lamel, L., Adda-Decker, M., Gauvain, J.L., 1995. Issues in large vocabulary multilingual speech recognition. In: Proc. Eurospeech, Madrid, pp. 185–189.

Laurent, A., Deléglise, P., Meignier, S., 2009. Grapheme to phoneme conversion using an SMT system. In: Interspeech 2009, Brighton, UK, pp. 708–711.

Le, V.-B., Besacier, L., 2009. Automatic speech recognition for under-resourced languages: application to Vietnamese language. IEEE Transactions on Audio, Speech and Language Processing 17(8), 1471–1482.

Le, V.B., Bigi, B., Besacier, L., Castelli, E., 2003. Using the Web for fast language model construction in minority languages. In: Eurospeech’03, Geneva, Switzerland, pp. 3117–3120.

Lee, 2009, Probabilistic modeling of Korean morphology, IEEE Transactions on Audio, Speech & Language Processing, 17, 945, 10.1109/TASL.2009.2019922

Loof, J., Gollan, C., Ney, H., 2009. Cross-language bootstrapping for unsupervised acoustic model training: rapid development of a Polish speech recognition system. In: Interspeech 2009. Brighton, UK.

Lopatková, M., Plátek, M., Kuboň, V., 2005. Modeling syntax of free word-order languages: dependency analysis by reduction. In: Proc. TSD’2005, Springer LNAI 3658, Karlovy Vary, Czech Republic, pp. 140–147.

Mihajlik, P., Fegyó, T., Tüske, Z., Ircing, P., 2007. Morpho-graphemic approach for the recognition of spontaneous speech in agglutinative languages – like Hungarian. In: Interspeech’07, Antwerp, Belgium.

Mikolov, T., Karafiat, M., Burget, L., Cernocky, J., Khudanpur, S., 2010. Recurrent neural network based language model. In: Proc. INTERSPEECH-2010, Makuhari, Japan, pp. 1045–1048.

Mohamed, 2012, Acoustic modeling using deep belief networks, IEEE Transactions on Audio, Speech, and Language Processing, 20, 14, 10.1109/TASL.2011.2109382

Muthusamy, Y.K., Cole, R.A., 1992. Automatic segmentation and identification of ten languages using telephone speech. In: Second International Conference on Spoken Language Processing.

Nakajima, H., Yamamoto, H., Watanabe, T., 2002. Language model adaptation with additional text generated by machine translation. In: COLING 2002, vol. 2, Taipei, Taiwan, pp. 716–722.

Nanjo, H., Kawahara, T., 2005. A new ASR evaluation measure and minimum Bayes-risk decoding for open-domain speech understanding. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP-2005, PA, USA, pp. 1053–1056.

The US NIST 2009 (RT-09) Rich Transcription Meeting Recognition Evaluation Plan, 2009.

Oparin, I., Glembek, O., Burget, L., Černocký, J., 2008. Morphological random forests for language modeling of inflectional languages. In: Proc. IEEE Workshop on Spoken Language Technology SLT’08, Goa, India.

Parent, G., Eskenazi, M., 2010. Toward better crowdsourced transcription: transcription of a year of the Let’s Go bus information system data. In: Proceedings of IEEE Workshop on Spoken Language Technology, Berkeley, California, December 2010, pp. 312–317.

Patel, 2009, A comparative study of speech and dialed input voice interfaces in rural India, 51

Patel, 2010, Avaaj Otalo: a field study of an interactive voice forum for small farmers in rural India, 733

Pellegrini, T., Lamel, L., 2006. Investigating automatic decomposition for ASR in less represented languages. In: ICSLP’06, Pittsburgh.

Pellegrini, T., Lamel, L., 2008. Are audio or textual training data more important for ASR in less-represented languages?. In: SLTU’08, Hanoi, Vietnam.

Pellegrini, 2009, Automatic word decompounding for ASR in a morphologically rich language: application to Amharic, IEEE Transactions on Audio, Speech & Language Processing, 17, 863, 10.1109/TASL.2009.2022295

Plahl, C., Schlueter, R., Ney, H., 2011. Cross-lingual portability of Chinese and English neural network features for French and German LVCSR. In: Proc. ASRU, USA.

Rastrow, A., Dredze, M., Khudanpur, S., 2012. Fast syntactic analysis for statistical language modeling via substructure sharing and uptraining. In: Proc. 50th Annual Meeting of Association for Computational Linguistics ACL’2012, Jeju, Korea, pp. 175–183.

Ronzhin, 2007, Russian voice interface, Pattern Recognition and Image Analysis, 17, 321, 10.1134/S1054661807020216

Rotovnik, 2007, Large vocabulary continuous speech recognition of an inflected language using stems and endings, Speech Communication, 49, 437, 10.1016/j.specom.2007.02.010

Roux, J.C., Botha, E.C., du Preez, J.A., 2000. Developing a multilingual telephone based information retrieval system in African languages. In: Proceedings of the Second International Conference on Language Resources and Evaluation, pp. 975–980.

Sak, H., Saraclar, M., Güngör, T., 2010. Morphology-based and sub-word language modeling for Turkish speech recognition. In: ICASSP 2010, pp. 5402–5405.

Sarikaya, R., Afify, M., Gao, Y., 2007. Joint morphological-lexical language modeling (JMLLM) for Arabic. In: Proc. ICASSP’07, vol. 4, pp. 181–184.

Schlippe, T., Ochs, S., Schultz, T., 2010. Wiktionary as a source for automatic pronunciation extraction. In: Interspeech 2010, Makuhari, Japan, 26–30 September 2010.

Schlippe, T., Ochs, S., Schultz, T., 2012a. Grapheme-to-phoneme model generation for indo-European languages. In: ICASSP 2012, Kyoto, Japan, 25–30 March 2012.

Schlippe, T., Ochs, S., Vu, N.T., Schultz, T., 2012b. Automatic error recovery for pronunciation dictionaries. In: Interspeech 2012, Portland, Oregon, 9–13 September 2012.

Schlippe, T., Ochs, S., Schultz, T., 2013. Web-based tools and methods for rapid pronunciation dictionary creation. Speech Communication. http://dx.doi.org/10.1016/j.specom.2013.06.015.

Schultz, T., 2002. GlobalPhone: a multilingual speech and text database developed at Karlsruhe University. In: ICSLP, pp. 345–348.

Schultz, T., 2006. Multilingual speech processing. In: Tanja Schultz, Katrin Kirchhoff (Eds.), Elsevier, Academic Press, ISBN 13: 978-0-12-088501-5, 2006.

Schultz, T., Black, A.W., Badaskar, S., Hornyak, M., Kominek, J., 2007. SPICE: web-based tools for rapid language adaptation in speech processing systems. In: Interspeech 2007, Antwerp, Belgium.

Schultz, T., Vu, N.T., Schlippe, T., 2013. GlobalPhone: a multilingual text & speech database in 20 languages. In: ICASSP 2013, Vancouver, Canada.

Schultz, T., Waibel, A., 1998. Language independent and language adaptive LVCSR. In: Proc. ICSLP, Sydney, pp. 1819–1822.

Schultz, 2001, Language independent and language adaptive acoustic modeling for speech recognition, Speech Communication, 35, 31, 10.1016/S0167-6393(00)00094-7

Seide, F., Li, G., Chen, X., Yu, D., 2011. Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: Proc. ASRU-2011 International Workshop, HI, USA, pp. 24–29.

Siniscalchi, 2013, Universal attribute characterization of spoken languages for automatic spoken language recognition, Computer Speech & Language, 27, 209, 10.1016/j.csl.2012.05.001

Solera-Urena, 2007, Robust ASR using support vector machines, Speech Communication, 49, 253, 10.1016/j.specom.2007.01.013

Stahlberg, F., Schlippe, T., Vogel, S., Schultz, T., 2012. Word segmentation through cross-lingual word-to-phoneme alignment. In: Proceedings of The Fourth IEEE Workshop on Spoken Language Technology (SLT 2012), Miami, Florida, 2–5 December 2012.

Stahlberg, F., Schlippe, T., Vogel, S., Schultz, T., 2013. Pronunciation extraction from phoneme sequences through cross-lingual word-to-phoneme alignment. In: Proceedings of the 1st international conference on statistical language and speech processing (SLSP 2013), Tarragona, Spain, 29–31 July 2013.

Stephenson, T.A., Escofet, J., Magimai-Doss, M., Bourlard, H., 2002. Dynamic Bayesian network based speech recognition with pitch and energy as auxiliary variables, Technical Report Idiap-RR-24-2002, p. 10.

Stolcke, A., Grezl, F., Hwang, M.-Y., Lei, X., Morgan, N., Vergyri, D., 2006. Cross-domain and cross-lingual portability of acoustic features estimated by multilayer perceptrons. In: Proc. ICASSP 2006.

Stüker, S., 2008. Integrating Thai grapheme based acoustic models into the ML-mix framework – for language independent and cross-language ASR. In: SLTU’08, Hanoi, Vietnam.

Stüker, S., Schultz, T., Metze, F., Waibel, A., 2003. Multilingual articulatory features, In: ICASSP 2003.

Stuker, S., Schultz, T., Metze, F., Waibel, A., 2003. Multilingual articulatory features. In: Proceedings. ICASSP’03 IEEE International Conference on Acoustics, Speech, and, Signal Processing.

Stüker, S., Besacier, L., Waibel, A., 2009. Human translations guided language discovery for ASR systems. In: InterSpeech-2009, Brighton, UK.

Suenderman, K., Liscombe, J., 2009. Localization of speech recognition in spoken dialog systems: how machine translation can make our lives. In: Interspeech 2009, Brighton, UK, pp. 1475–1478.

Szarvas, M., Furui, S., 2003. Finite-state transducer based modeling of morphosyntax with applications to Hungarian LVCSR. In: Proc. ICASSP, HongKong, China, pp. 368–371.

Tachbelie, M., Abate, S.T., Besacier, L., Rossato, S., 2012. Syllable-based and hybrid acoustic models for Amharic speech recognition. In: SLTU – Workshop on Spoken Language Technologies for Under-Resourced Languages, Cape-Town, South Africa.

Tachbelie, M., Abate, S.T., Besacier, L., 2013. Using different acoustic, lexical and language modeling units for ASR of an under-resourced language – Amharic. Speech Communication. http://dx.doi.org/10.1016/j.specom.2013.01.008.

Tarjan, B., Mihajlik, P., 2010. On morph-based LVCSR improvements. In: Proc. 2nd Int. Workshop on Spoken Languages Technologies for Under-resourced Languages SLTU-2010, Malaysia, pp. 10–16.

Thomas, S., Ganapathy, S., Hermansky, H., 2012a. Multilingual MLP features for low-resource LVCSR systems. In: Proc. ICASSP, Japan.

Thomas, S., Ganapathy, S., Jansen, A., Hermansky, H., 2012b. Data-driven posterior features for low resource speech recognition applications. In: Proc. Interspeech, USA.

Toth, L., Frankel, J., Gosztolya, G., King, S., 2008. Cross-lingual portability of MLP-based tandem features – a case study for English and Hungarian. In: Proc. Interspeech.

Trentin, 2001, A survey of hybrid ANN/HMM models for automatic speech recognition, Neurocomputing, 37, 91, 10.1016/S0925-2312(00)00308-8

van Heerden, C., Kleynhans, N., Barnard, E., Davel, M., 2010. Pooling ASR data for closely related languages. In: Proceedings of the Workshop on Spoken Languages Technologies for Under-Resourced Languages (SLTU 2010), Penang, Malaysia, May 2010, pp. 17–23.

van Niekerk, D.R., Barnard, E., 2013. Predicting utterance pitch targets in Yoruba for tone realisation in speech synthesis, Speech Communication. http://dx.doi.org/10.1016/j.specom.2013.01.009.

Vergyri, D., Kirchhoff, K., Duh, K., Stolcke, A., 2004. Morphology-based language modeling for Arabic speech recognition. In: Proc. ICSLP’04, pp. 2245–2248.

Vesely, K., Karafiat, M., Grezl, F., Janda, M., Egorova, E., 2012. The language-independent bottleneck features. In: Proc. SLT, USA.

Vu, N.T., Kraus, F., Schultz, T., 2010. Multilingual A-stabil: a new confidence score for multilingual unsupervised training. In: Proc. SLT, USA.

Vu, N.T., Kraus, F., Schultz, T., 2011. Rapid building of an ASR system for under-resourced languages based on multilingual unsupervised training. In: Proc. Interspeech, Italy.

Vu, N.T., Metze, F., Schultz, T., 2012a. Multilingual bottle-neck feature for under resourced languages. In: Proc. SLTU, South Africa.

Vu, N.T., Breiter, W., Metze, F., Schultz, T., 2012b. An investigation on initialization schemes for multilayer perceptron training using multilingual data and their effect on ASR performance. In: Proc. Interspeech, USA.

Wheatley, B., Kondo, K., Anderson, W., Muthusamy, Y., 1994. An evaluation of cross-language adaptation for rapid HMM development in a new language. In: Proc. ICASSP, Adelaide, pp. 237–240.

Whittaker, E.W.D., 2000. Statistical language modelling for automatic speech recognition of Russian and English. Ph.D. thesis, Cambridge Univ., p. 140.

Whittaker, E.W.D., Woodland, P.C., 2001. Efficient class-based language modelling for very large vocabularies. In: ICASSP-2001, Salt Lake City, USA, pp. 545–548.

Wissing, 2008, Vowel variations in Southern Sotho: an acoustical investigation, Southern African Linguistics and Applied Language Studies, 26, 255, 10.2989/SALALS.2008.26.2.6.570

Young, 1997, Multilingual large vocabulary speech recognition: the European SQALE project, Computer Speech & Language, 11, 73, 10.1006/csla.1996.0023

Young, 2008, HMMs and related speech recognition technologies, 539

Yu, D., Siniscalchi, S.M., Deng, L., Lee, C.-H., 2012. Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition. In: Proc. ICASSP-2012, pp. 4169–4172.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA