Audio-visual speech recognition by speechreading
2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628) - Tập 2 - Trang 1069-1072 vol.2
Tóm tắt
Speechreading increases intelligibility in human speech perception. This suggests that conventional acoustic-based speech processing can benefit from the addition of visual information. This paper exploits speechreading for joint audio-visual speech recognition. We first present a color-based feature extraction algorithm that is able to extract salient visual speech features reliably from a frontal view of the talker in a video sequence. Then, a new fusion strategy using a coupled hidden Markov model (CHMM) is proposed to incorporate visual modality into the acoustic subsystem. By maintaining temporal coupling across the two modalities at the feature level and allowing asynchrony in the state at the same time, a CHMM provides a better model for capturing temporal correlations between the two streams of information. The experimental results demonstrate that the combined audio-visual system outperforms the acoustic-only recognizer over a wide range of noise levels.
Từ khóa
#Speech recognition #Hidden Markov models #Humans #Speech processing #Feature extraction #Data mining #Video sequences #Maintenance #Streaming media #Audio-visual systemsTài liệu tham khảo
0
brand, 1996, Coupled hidden Markov models for modeling interacting processes, Tech Rept TR 405
boyen, 1998, Tractable inference for complex stochastic processes, Proc 14 Ann Conf Uncertainty in Artif Intel, 33
huang, 1994, Inference in belief networks: a procedural guide, Int J Approx Reasoning, 11, 1
young, 1999, The HTK Book
murphy, 2001, The Bayes' net toolbox for Matlab, Proc Symp Interface Statist Comput Sci, 33
10.1109/ICASSP.1994.389567
10.1109/35.41402
potamianos, 2001, Heirarchical discriminant features for audio-visual LVCSR, Proc IEEE ICASSP
10.1109/89.536928
10.1109/ICIP.2000.899336
10.1109/34.982900
petajan, 1984, Automatic Lipreading to Enhance Speech Recognition
10.1121/1.1907309
zhang, 2001, Automatic speechreading with applications to human-computer interfaces, EURASIP J Appl Sig Proc