Applying dynamic context into MLP/HMM speech recognition system

Computer Speech & Language - Tập 15 - Trang 233-255 - 2001
Petri Salmela1
1Tampere University of Technology, Digital and Computer Systems Laboratory, PO Box 553, FIN–33101 Tampere, Finland

Tài liệu tham khảo

Bishop, 1996 Bourlard, 1990, Links between Markov models and multilayer perceptrons, IEEE Transactions on Pattern Analysis and Machine Intelligence, 12, 1167, 10.1109/34.62605 Bourlard, 1993, Continuous speech recognition by connectionist statistical methods, IEEE Transactions on Neural Networks, 4, 893, 10.1109/72.286885 H. Bourlard, B. D’hoore, J. Boite, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 1994, 373, 376 Chou, 2000, Discriminant-function-based minimum recognition error rate pattern-recognition approach to speech recognition, Proceedings of the IEEE, 88, 1201, 10.1109/5.880080 G. Cook, T. Robinson, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 1998, 917, 920 Deller, 1993 Deng, 1994, Speech recognition using hidden Markov models with polynomial regression function as nonstationary states, IEEE Transactions on Speech and Audio Processing, 2, 507, 10.1109/89.326610 Dugast, 1994, Combining TDNN and HMM in a hybrid system for improved continuous-speech recognition, IEEE Transactions on Speech and Audio Processing, 2, 217, 10.1109/89.260364 H. Franco, M. Weintraub, M. Cohen, Proceedings of the IEEE International Conference on Neural Networks, Houston, Texas, USA, 1997, 2089, 2092 Furui, 1986, Speaker independent isolated word recognition using dynamic features of speech spectrum, IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP-34, 52, 10.1109/TASSP.1986.1164788 Gu, 1991, Isolated-utterance speech recognition using hidden Markov models with bounded state durations, IEEE Transactions on Signal Processing, 39, 1743, 10.1109/78.91145 R. Haeb-Umbach, D. Geller, H. Ney, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Minneapolis, MN, USA, 1993, 239, 242 Huang, 1992, A combined self-organizing feature map and multilayer perceptron for isolated word recognition, IEEE Transactions on Signal Processing, 40, 2651, 10.1109/78.165652 J. Iso-Sipilä, K. Laurila, P. Haavisto, Proceedings of IEEE Nordic Signal Processing Symposium, Espoo, Finland, 1996, 107, 110 Jelinek, 1998 Johnson, 1992 Juang, 1997, Minimum error rate methods for speech recognition, IEEE Transactions on Speech and Audio Processing, 5, 257, 10.1109/89.568732 J. Kangas, Proceedings of the International Joint Conference on Neural Networks, San Diego, California, USA, 1990, 331, 336 J. Kangas, Proceedings of the 1991 International Conference on Artificial Neural Networks, Espoo, Finland, 1991, 1591, 1594 S. Katagiri, C.-H. Lee, B.-H. Juang, Proceedings of the 1991 IEEE Workshop on Neural Networks for Signal Processing, Princeton, New Jersey, USA, 1991, 299, 308 V. Kepuska, J. Gowdy, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Glasgow, Scotland, 1989, 504, 507 Kohonen, 2001 Kokkonen, 1990, Using self-organizing maps and multi-layered feed-forward nets to obtain phonemic transcription of spoken utterances, Speech Communication, 9, 541, 10.1016/0167-6393(90)90029-9 Krogh, 1999, Hidden neural networks, Neural Computation, 11, 541, 10.1162/089976699300016764 K. Laurila, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Munich, Germany, 1997, 871, 874 Morgan, 1995, Neural networks for statistical recognition of continuous speech, Proceedings of the IEEE, 83, 741, 10.1109/5.381844 N. Morgan, D. Ellis, E. Fosler-Lussier, A. Janin, B. Kingsbury, Proceedings of the DARPA Broadcast News Workshop, Herndon, Virginia, 1999 D. Nquyen, B. Widrow, Proceedings of International Joint Conference of Neural Networks, San Diego, California, USA, 1990, 21, 26 Ostendorf, 1996, From HMMs to segment models: stochastic modelling for CSR, 185 B. Petek, A. Waibel, J. Tebelskis, Proceedings of the European Conference on Speech Communication and Technology, Genova, Italy, 1991, 1407, 1410 Rabiner, 1989, Tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE, 77, 257, 10.1109/5.18626 Richard, 1991, Neural network classifiers estimate Bayesian a posteriori probabilities, Neural Computation, 3, 461, 10.1162/neco.1991.3.4.461 T. Robinson, M. Hochberg, S. Renals, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Adelaide, Australia, 1994, 37, 40 Robinson, 1996, The use of recurrent neural networks in continuous speech recognition, 233 P. Salmela, K. Laurila, M. Lehtokangas, J. Saarinen, Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, Tokyo, Japan, 1999 a, 165, 171 Salmela, 1999, Neural network based digit recognition system for voice dialling in noisy environments, International Journal of Information Sciences, 121, 171, 10.1016/S0020-0255(99)00077-8 Senior, 1996, Forward-backward retraining of recurrent neural networks, 743 J. Tebelskis, 1995 Viikki, 1998, Cepstral domain segmental feature vector normalization for noise robust speech recognition, Speech Communication, 25, 133, 10.1016/S0167-6393(98)00033-8 Vogl, 1988, Accelerating the convergence of the back-propagation method, Biological Cybernetics, 59, 257, 10.1007/BF00332914 S. Young, N. Russell, J. Thornton, Technical Report, 1989, University of Cambridge, Department of Engineering, Cambridge, England