Real time face detection for multimodal speech recognition
Proceedings. IEEE International Conference on Multimedia and Expo - Tập 2 - Trang 373-376 vol.2
Tóm tắt
We propose a real time system to detect the speaker's frontal face for multimodal speech recognition. It is widely acknowledged that automatic speech recognizers, as well as humans, can improve recognition performance by adding visual modality, i.e., the speaker's facial image to audio modality. Visual modality also provides inaudible information, such as the speaker's facial orientation, and the location of the mouth. To acquire this information, we have to localize the speaker's face in real time. Our system is a combination of skin color detection and spatial feature detection. The color-based detection is fast but depends on the skin and the background color, while the special feature detection requires more computation. We applied color-based pruning to reduce the search space for the spatial feature detection. By detecting the facial orientation, the proposed method functions as a "face to talk" switch in place of the "push to talk" switch. In our experiment, pruning based on color reduced 53-97% of the search space, and 98.9% of the frontal face was detected correctly by the subsequent spatial detector.
Từ khóa
#Face detection #Speech recognition #Switches #Computer vision #Face recognition #Image recognition #Automatic speech recognition #Skin #Real time systems #HumansTài liệu tham khảo
10.1109/ICASSP.2000.859318
potamianos, 0, Large-vocabulary audio-visual speech recognition by machines and humans, Proc Eurospeech 2001, 1027
murai, 2001, A robust end point detection by speaker's facial motion, Proc HSC2001, 99l
10.1007/978-3-662-13015-5
kumatani, 2001, An adaptive integration method based on product hmm for bi-modal speech recognition, Proc HSC2001, 195
gagne, 0, Evaluation of a visual-FM system to enhance speechreading, Proc AVSP'99
de gelder, 0, Impaired speechreading related to arrested development of face processing, Proc AVSP'99
0
fukui, 1997, Facial feature point extraction method based on combination of shape extraction and pattern matching, 2170
harashima, 1998, Facial image processing system for human-like, Kansei" Agent" IP A
yang, 1998, Real-time face and facial feature tracking and applications, Proc AVSP 98, 79
0