High performance telephone bandwidth speaker independent continuous digit recognition

P. Cosi1, J.-P. Hosoma, A. Valente2,3
1Istituto di Fonetica e Dialettologia - C.N.R., Padova, ITALY
2Dipartimento di Elenronica e Informatica, Universià di Padova, Padova, Italy
3Dipartimento di Eiettronica e Informatica, Università di Padova, Padova, ITALY

Tóm tắt

The development of a high-performance telephone-bandwidth speaker independent connected digit recognizer for Italian is described. The CSLU Speech Toolkit was used to develop and implement the hybrid ANN/HMM system, which is trained on context-dependent categories to account for coarticulatory variation. Various front-end processing and system architectures were compared and, when the best features (MFCC with CMS + /spl Delta/) and network (4-layer fully connected feed-forward network) were considered, there was a 98.92% word recognition accuracy and a 92.62% sentence recognition accuracy on a test set of the FIELD continuous digits recognition task.

Từ khóa

#Telephony #Bandwidth #Speech recognition #Automatic speech recognition #Natural languages #Hidden Markov models #System testing #Mel frequency cepstral coefficient #Collision mitigation #Feedforward systems

Tài liệu tham khảo

hosom, 1998, Evaluation and Integration of Neural-Network Training Techniques for Continuous Digit Recognition, Proc ICSLP-98, 3, 731 10.1109/ICASSP.1999.759721 falavigna, 1997, On Field Experiments of Continuous Digit Recognition over the Telephone Network, Proc EUROSPEECH cosi, 2000, High Performance Italian Continuous Digit recognition, Proc ICSLP-2000, iv, 242 10.1121/1.399423 10.1109/TASSP.1980.1163420 10.1109/89.326616 10.1109/TASSP.1981.1163530 fanty, 1992, An Interactive Environment for Speech Recognition Research, Proc ICSLP, 1543 0 10.1109/ICASSP.1992.225867 wilpon, 1991, Improvements in Connected Digit Recognition Using Higer Order Spectral and Energy Features, Proc IEEE-ICASSP, 1, 349 10.1109/89.279279 10.1109/ICASSP.1993.319279 10.1109/ICASSP.1994.389344 10.1109/ICASSP.1984.1172716 rabiner, 1989, High Prformance Connected Digit Recognition Using Hidden Markov Models, IEEE Trans ASSP, 37, 1214, 10.1109/29.31269 10.1109/ICASSP.1998.674357 10.1109/ICASSP.1989.266487 bourlard, 1995, Towards Increasing Speech Recognition Error Rates, Proc EUROSPEECH, 2, 883 fourcin, 1989, Speech Input and Output Assessment Multilingual Methods and Standards nigra, 0, Riconoscimento di Cifre Connesse su Rete Telefonica, DT Doc Tecnici CSELT boite, 1993, A New Approach Towards Keyword Spotting, Proc EUROSPEECH, 2, 1273 10.1109/ICASSP.1998.674476 yan, 1997, Speech Recognition Using Neural Networks with Forward-Backward Probability Generated Targets, Proc IEEE-ICASSP, 4, 3241