Investigating stochastic speech understanding

H. Bonneau-Maynard1, F. Lefevre1
1Spoken Language Processing Group, Orsay cedex, FRANCE

Tóm tắt

The need for human expertise in the development of a speech understanding system can be greatly reduced by the use of stochastic techniques. However corpus-based techniques require the annotation of large amounts of training data. Manual semantic annotation of such corpora is tedious, expensive, and subject to inconsistencies. This work investigates the influence of the training corpus size on the performance of the understanding module. The use of automatically annotated data is also investigated as a means to increase the corpus size at a very low cost. First, a stochastic speech understanding model developed using data collected with the LIMSI ARISE dialog system is presented. Its performance is shown to be comparable to that of the rule-based caseframe grammar currently used in the system. In a second step, two ways of reducing the development cost are pursued: (1) reducing of the amount of manually annotated data used to train the stochastic models and (2) using automatically annotated data in the training process.

Từ khóa

#Stochastic processes #Costs #Natural languages #Stochastic systems #Humans #Data mining #Speech analysis #Training data #Performance evaluation #Telephony

Tài liệu tham khảo

10.3115/1075812.1075837 riccardi, 0, Stochastic Language Models for Speech Recognition and Understanding, Proc ICSLP'98 minker, 1998, Comprehension Automatique de la Parole Spontanee bennacef, 0, A spoken language system for information retrieval, Proc ICSLP 94 bonneau-maynard, 0, A Framework for evaluating contextual understanding, Proc ICSLP'2000 10.1016/S0167-6393(99)00067-9 10.1109/ICASSP.1997.596229 lefevre, 2001, Genericity and Adaptability Issues for Task-Independent Speech Recognition, ISCA ITRW on Adaptation Methods for Speech Recognition Nice pieraccini, 1993, A learning Approach to Natural Language Understanding