High performance telephone bandwidth speaker independent continuous digit recognition
Tóm tắt
The development of a high-performance telephone-bandwidth speaker independent connected digit recognizer for Italian is described. The CSLU Speech Toolkit was used to develop and implement the hybrid ANN/HMM system, which is trained on context-dependent categories to account for coarticulatory variation. Various front-end processing and system architectures were compared and, when the best features (MFCC with CMS + /spl Delta/) and network (4-layer fully connected feed-forward network) were considered, there was a 98.92% word recognition accuracy and a 92.62% sentence recognition accuracy on a test set of the FIELD continuous digits recognition task.
Từ khóa
#Telephony #Bandwidth #Speech recognition #Automatic speech recognition #Natural languages #Hidden Markov models #System testing #Mel frequency cepstral coefficient #Collision mitigation #Feedforward systemsTài liệu tham khảo
hosom, 1998, Evaluation and Integration of Neural-Network Training Techniques for Continuous Digit Recognition, Proc ICSLP-98, 3, 731
10.1109/ICASSP.1999.759721
falavigna, 1997, On Field Experiments of Continuous Digit Recognition over the Telephone Network, Proc EUROSPEECH
cosi, 2000, High Performance Italian Continuous Digit recognition, Proc ICSLP-2000, iv, 242
10.1121/1.399423
10.1109/TASSP.1980.1163420
10.1109/89.326616
10.1109/TASSP.1981.1163530
fanty, 1992, An Interactive Environment for Speech Recognition Research, Proc ICSLP, 1543
0
10.1109/ICASSP.1992.225867
wilpon, 1991, Improvements in Connected Digit Recognition Using Higer Order Spectral and Energy Features, Proc IEEE-ICASSP, 1, 349
10.1109/89.279279
10.1109/ICASSP.1993.319279
10.1109/ICASSP.1994.389344
10.1109/ICASSP.1984.1172716
rabiner, 1989, High Prformance Connected Digit Recognition Using Hidden Markov Models, IEEE Trans ASSP, 37, 1214, 10.1109/29.31269
10.1109/ICASSP.1998.674357
10.1109/ICASSP.1989.266487
bourlard, 1995, Towards Increasing Speech Recognition Error Rates, Proc EUROSPEECH, 2, 883
fourcin, 1989, Speech Input and Output Assessment Multilingual Methods and Standards
nigra, 0, Riconoscimento di Cifre Connesse su Rete Telefonica, DT Doc Tecnici CSELT
boite, 1993, A New Approach Towards Keyword Spotting, Proc EUROSPEECH, 2, 1273
10.1109/ICASSP.1998.674476
yan, 1997, Speech Recognition Using Neural Networks with Forward-Backward Probability Generated Targets, Proc IEEE-ICASSP, 4, 3241
