Speech recognition of broadcast news for the European Portuguese language
Tóm tắt
This paper describes our work on the development of a large vocabulary continuous speech recognition system applied to a broadcast news task for the European Portuguese language in the scope of the ALERT project. We start by presenting the baseline recogniser AUDIMUS, which was originally developed with a corpus of read newspaper text. This is a hybrid system that uses a combination of phone probabilities generated by several MLPs trained on distinct feature sets. The paper details the modifications introduced in this system, namely in the development of a new language model, the vocabulary and pronunciation lexicon and the training on new data from the ALERT BN corpus currently available. The system trained with this BN corpus achieved 18.4% WER when tested with the F0 focus condition (studio, planed, native, clean), and 35.2% when tested in all focus conditions.
Từ khóa
#Speech recognition #Broadcasting #Natural languages #Streaming media #System testing #Databases #Vocabulary #Multimedia systems #TV #Audio recordingTài liệu tham khảo
bourlard, 1994, Connectionist Speech Recognition - A Hybrid Approach
neto, 1997, The Design of a Large Vocabulary Speech Corpus for Portuguese, Proc Eurospeech 97
clarkson, 1997, Statistical Language Modelling Using the CMU-Cambridge Toolkit, Proceedings of Eurospeech 97
meinedo, 2000, Combination of acoustic models in continuous speech recognition hybrid systems, Proc ICSLP 2000
sliegler, 1997, Automatic Segmentation, Classification and clustering of Broadcast News, Proc DARPA Speech Recognition Workshop
10.1109/89.784107
kingsbury, 1998, Robust speech recognition using the modulation spectrogram, Speech Communication, 25, 117, 10.1016/S0167-6393(98)00032-6
10.1109/ICASSP.1992.225957
neto, 1998, A large vocabulary continuous speech recognition hybrid system for the Portuguese language, Proc ICSLP 98
rocha, 2000, CETEMPúblico: Um corpus de grandes dimensões de linguagem jornalística portuguesa, In Proceedings PROPOR’2000
amaral, 2001, The development of a Portuguese version of a media watch system, Proc Eurospeech 2001