Dynamic Bayesian network based speech recognition with pitch and energy as auxiliary variables

T.A. Stephenson1,2, J. Escofet1,2, M. Magimai-Doss3,2, H. Bourlard1,2
1Swiss Federal Institute of Technology, Lausanne, Switzerland
2Dalle Molle Institute for Perceptual Artificial Intelligence, Martigny, Switzerland
3Visiting IDIAP under the European Masters in Language and Speech, Technical University of Catalonia, Barcelona, Spain

Tóm tắt

Pitch and energy are two fundamental features describing speech, having importance in human speech recognition. However, when incorporated as features in automatic speech recognition (ASR), they usually result in a significant degradation on recognition performance due to the noise inherent in estimating or modeling them. We show experimentally how this can be corrected by either conditioning the emission distributions upon these features or by marginalizing out these features in recognition. Since to do this is not obvious with standard hidden Markov models (HMMs), this work has been performed in the framework of dynamic Bayesian networks (DBNs), resulting in more flexibility in defining the topology of the emission distributions and in specifying whether variables should be marginalized out.

Từ khóa

#Bayesian methods #Speech recognition #Automatic speech recognition #Degradation #Hidden Markov models #Acoustic emission #Artificial intelligence #Humans #Speech enhancement #Network topology

Tài liệu tham khảo

10.1109/ICASSP.1995.479283 10.1109/ICPR.2002.1047454 stephenson, 2001, Modeling Auxiliary Infermation in Bayesian Network Based ASR, 7th European Conference on Speech Communication and Technology, 4, 2765 zweig, 1998, Speech Recognition With Dynamic Bayesian Networks 10.1109/IJCNN.1992.226966 10.1109/ICASSP.2001.940880 10.1016/0167-9473(93)E0056-A 10.1080/01621459.1992.10476265 10.1109/TAU.1972.1162410 10.1023/A:1008935617754 10.1109/ICASSP.1997.598872 cowell, 1999, Probabilistic Networks and Expert Systems Statistics for Engineering and Information Science 10.1109/ICASSP.1998.675370