AGH corpus of Polish speech

Springer Science and Business Media LLC - Tập 50 - Trang 585-601 - 2015
Piotr Żelasko1, Bartosz Ziółko1,2, Tomasz Jadczyk1,2, Dawid Skurzok1,2
1AGH University of Science and Technology, Kraków, Poland
2Techmo sp. z o.o., Kraków, Poland

Tóm tắt

A corpus of Polish speech, which has been collected for the purpose of automatic speech recognition (ASR) and text-to-speech (TTS) systems applications, is presented. The corpus consists of several groups of recordings: read sentences, spoken commands, a phonetically balanced TTS training corpus, telephonic speech and others. In summary duration of recordings is above 25 h. Number of unique speakers amounts to 166. The majority of them being in an age group of 20–35 and one third of them being female. Analysis of unique word occurrence frequency in relation to larger text resources has been concluded. From them, most commonly appearing words have been found and presented. The corpus was used as training data for the ASR system. Results of cross-validation training and testing the SARMATA ASR system using our corpus have shown that phrase recognition rate is 91.9 %. The corpus was additionally evaluated in comparative test against the CORPORA corpus, which had shown major increase in phrase recognition rate in favour of our corpus.

Tài liệu tham khảo

Abushariah, M., Ainon, R., Zainuddin, R., Elshafei, M., & Khalifa, O. (2012). Phonetically rich and balanced text and speech corpora for Arabic language. Language Resources and Evaluation, 46(4), 601–634. Demenko, G., Grocholewski, S., Klessa, K., Ogórkiewicz, J., Wagner, A., Lange, M., Śledziński, D., & Cylwik, N. (2008). JURISDIC: Polish speech database for taking dictation of legal texts. Proceedings of the International Conference on Language Resources and Evaluation (pp. 1280–1287). Denes, P. (1960). Automatic speech recognition: Experiments with a recogniser using linguistic statistics. Technical report, DTIC document. Denes, P., & Mathews, M. (1960). Spoken digit recognition using time-frequency pattern matching. The Journal of Acoustical Society of America, 32(11), 1450–1455. Felis, J., Flach, A., & Kamisiński, T. (2012). Testing of a device for positioning measuring microphones in anechoic and reverberation chambers. Archives of Acoustics, 37, 245–250. Fromkin, V. (1984). Speech errors as linguistic evidence. Janua Linguarum. Series maior. Berlin: De Gruyter. Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallett, D., Dahlgren, N., et al. (1993). TIMIT acoustic-phonetic continuous speech corpus LDC93S1. Philadelphia: Linguistic Data Consortium. Godfrey, J., & Hollman, E. (1993). Switchboard-1 release 2 LDC97S62. Philadelphia: Linguistic Data Consortium. Grocholewski, S. (1997). CORPORA-speech database for Polish diphones. Proceedings of Eurospeech. GUS. (2011). Ludność Stan i struktura demograficzno-spoeczna. Narodowy Spis Powszechny Ludności i Mieszkań 2011. Halle, M., & Stevens, K. (1962). Speech recognition: A model and a program for research. IRE Transactions on Information Theory, 8(2), 155–159. Hämäläinen, A., Avelar, J., Rodrigues, S., Dias, M. S., Kolesiński, A., Fegyó, T., Németh, G., Csobánka, P., Lan, K., & Hewson, D. (2014). The EASR corpora of European Portuguese, French, Hungarian and Polish elderly speech. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik, Iceland. European Language Resources Association (ELRA). Hansen, K. (2014). Stosunek Polaków do dialektów regionalnych. raport na podstawie Polskiego Sondażu Uprzedzeń 2013. Kilgarriff, A., & Grefenstette, G. (2001). Web as corpus. In Lancaster University (pp. 342–344). Le, V.-B., & Besacier, L. (2009). Automatic speech recognition for under-resourced languages: Application to Vietnamese language. IEEE Transactions on Audio, Speech, and Language Processing, 17(8), 1471–1482. Lööf, J., Gollan, C., & Ney, H. (2009). Cross-language bootstrapping for unsupervised acoustic model training: Rapid development of a Polish speech recognition system. Proceedings of Interspeech, Brighton (pp. 88–91). Marasek, K., Brocki, Ł., Koržinek, D., Szklanny, K., & Gubrynowicz, R. (2009). User-centered design for a voice portal. Aspects of Natural Language Processing, Lecture Notes in Computer Science, 5070, 273–293. Marciniak, M. (Ed.). (2010). Anotowany korpus dialogów telefonicznych. Warsaw: Akademicka Oficyna Wydawnicza EXIT. OpenSJP (2014). Open source online dictionary of the Polish language. http://sjp.pl. Accessed 10 Apr 2014. Pawlaczyk, L., & Bosky, P. (2009). Skrybot: a system for automatic speech recognition of Polish language. Advances in Soft Computing, Man-Machine Interactions, Springer, 59(2009), 381–387. Przepiórkowski, A., Bańko, M., Górski, R., & Lewandowska-Tomaszczyk, B. (2012). Narodowy Korpus Jezyka Polskiego. Warsaw: Wydawnictwo Naukowe PWN. Pułka, A., & Kłosowski, P. (2008). Polish semantic speech recognition expert system supporting electronic design system. Proceedings of Conference on Human System Interactions (HSI), Krakow (pp. 479–484). Resnik, P. (1999). Mining the web for bilingual text. In In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (pp. 527–534). Reyes, A., Rosso, P., & Veale, T. (2013). A multidimensional approach for detecting irony in Twitter. Language Resources and Evaluation, 47(1), 239–268. Rochester, S. (1973). The significance of pauses in spontaneous speech. Journal of Psycholinguistic Research, 2, 51–81. Scannell, K. P. (2007). The crúbadán project: Corpus building for under-resourced languages. In Building and Exploring Web Corpora: Proceedings of the 3rd Web as Corpus Workshop (Vol. 4, pp. 5–15). Schultz, T. (2002). Globalphone: A multilingual speech and text database developed at Karlsruhe University. In Proceedings of the ICSLP (pp. 345–348). Schultz, T., & Waibel, A. (1997). Fast bootstrapping of LVCSR systems with multilingual phoneme sets. In Proceedings of Eurospeech, Rhodes (pp. 371–374). Steffen-Batóg, M., & Nowakowski, P. (1992). An algorithm for phonetic transcription of orthographic texts in Polish. Studia Phonetica Posnaniensia, 3, 135–183. Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Moore, G., et al. (2005). HTK Book. UK: Cambridge University Engineering Department. Ziółko, B., & Skurzok, D. (2011). N-grams model for polish. In Ivo Ipsic (Ed.), Speech and language technologies (pp. 107–127). InTech. Ziółko, B., Gałka, J., Manandhar, S., Wilson, R., & Ziółko, M. (2007). Triphone statistics for Polish language. Proceedings of 3rd Language and Technology Conference, Poznań. Ziółko, M., Gałka, J., Ziółko, B., Jadczyk, T., Skurzok, D., & , M. (2011). Automatic speech recognition system dedicated for Polish. Proceedings of Interspeech, Florence.