Audio-visual speech recognition by speechreading

2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628) - Tập 2 - Trang 1069-1072 vol.2

Xiaozheng Zhang¹, R.M. Mersereau¹, M.A. Clements¹

¹School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA

Tóm tắt

Speechreading increases intelligibility in human speech perception. This suggests that conventional acoustic-based speech processing can benefit from the addition of visual information. This paper exploits speechreading for joint audio-visual speech recognition. We first present a color-based feature extraction algorithm that is able to extract salient visual speech features reliably from a frontal view of the talker in a video sequence. Then, a new fusion strategy using a coupled hidden Markov model (CHMM) is proposed to incorporate visual modality into the acoustic subsystem. By maintaining temporal coupling across the two modalities at the feature level and allowing asynchrony in the state at the same time, a CHMM provides a better model for capturing temporal correlations between the two streams of information. The experimental results demonstrate that the combined audio-visual system outperforms the acoustic-only recognizer over a wide range of noise levels.

Từ khóa

#Speech recognition #Hidden Markov models #Humans #Speech processing #Feature extraction #Data mining #Video sequences #Maintenance #Streaming media #Audio-visual systems

Tài liệu tham khảo

0 brand, 1996, Coupled hidden Markov models for modeling interacting processes, Tech Rept TR 405 boyen, 1998, Tractable inference for complex stochastic processes, Proc 14 Ann Conf Uncertainty in Artif Intel, 33 huang, 1994, Inference in belief networks: a procedural guide, Int J Approx Reasoning, 11, 1 young, 1999, The HTK Book murphy, 2001, The Bayes' net toolbox for Matlab, Proc Symp Interface Statist Comput Sci, 33 10.1109/ICASSP.1994.389567 10.1109/35.41402 potamianos, 2001, Heirarchical discriminant features for audio-visual LVCSR, Proc IEEE ICASSP 10.1109/89.536928 10.1109/ICIP.2000.899336 10.1109/34.982900 petajan, 1984, Automatic Lipreading to Enhance Speech Recognition 10.1121/1.1907309 zhang, 2001, Automatic speechreading with applications to human-computer interfaces, EURASIP J Appl Sig Proc

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA