Modeling focus of attention for meeting indexing based on multiple cues

IEEE Transactions on Neural Networks - Tập 13 Số 4 - Trang 928-938 - 2002
R. Stiefelhagen1, Jie Yang2, A. Waibel2
1Institute for Logic, Complexity and Deduction Systems, University of Karlsruhe, Germany
2School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA

Tóm tắt

A user's focus of attention plays an important role in human-computer interaction applications, such as a ubiquitous computing environment and intelligent space, where the user's goal and intent have to be continuously monitored. We are interested in modeling people's focus of attention in a meeting situation. We propose to model participants' focus of attention from multiple cues. We have developed a system to estimate participants' focus of attention from gaze directions and sound sources. We employ an omnidirectional camera to simultaneously track participants' faces around a meeting table and use neural networks to estimate their head poses. In addition, we use microphones to detect who is speaking. The system predicts participants' focus of attention from acoustic and visual information separately. The system then combines the output of the audio- and video-based focus of attention predictors. We have evaluated the system using the data from three recorded meetings. The acoustic information has provided 8% relative error reduction on average compared to only using one modality. The focus of attention model can be used as an index for a multimedia meeting record. It can also be used for analyzing a meeting.

Từ khóa

#Indexing #Application software #Collaborative work #Ubiquitous computing #Monitoring #Cameras #Face detection #Neural networks #Microphones #Acoustic signal detection

Tài liệu tham khảo

0, The Stuttgart Neural Network Simulator gross, 2000, toward a multimodal meeting record, Proc IEEE Int Conf Multimedia Expo, 10.1109/ICME.2000.871074 waibel, 1998, meeting browser: tracking and summarizing meetings, Proc Broadcast News Transcription and Understanding Workshop, 281 10.1016/S0277-9536(00)00227-6 perret, 1994, understanding the intentions of others from visual signals: neurophysiological evidence, Cahiers de Psychologie Cognitive, 13, 683 10.1016/S1364-6613(99)01436-9 gopher, 1990, chapter attention, The Blackwell Dictionary of Cognitive Psychology, 23 diebold jr, 1968, chapter anthropology of the comparative psychology of communicative behavior, Animal Communication&#x2014 Techniques of Study and Results of Research argyle, 1976, Gaze and Mutual Gaze gee, 1994, non-intrusive gaze tracking for human–computer interaction, Proc Mechatron Machine Vision Practice, 112 10.1109/IJSIS.1996.565083 10.1109/CVPR.1997.609312 schiele, 1995, gaze tracking based on face-color, Int Workshop Automatic Face and Gesture Recognition, 344 10.1109/34.667881 10.1145/302979.303065 10.1109/34.588027 argyle, 1969, Social Interaction bett, 2000, multimodal meeting tracker, Proc RIAO 2000 Content-Based Multimedia Inform Access 10.1109/34.58871 10.1145/142750.142977 goodwin, 1981, Conversational Organization Interaction Between Speakers and Hearers yang, 1999, multimodal people id for a multimedia meeting browser, Proc ACM Multimedia coen, 1998, design principles for intelligent environments, Proc Intell Environments 1998 AAAI Spring Symp, 37 10.1016/S0149-7634(00)00025-7 mozer, 1998, the neural network house: an environment that adapts to its inhabitants, Proc Intell Environments 1998 AAAI Spring Symp, 110 10.1109/72.661121 10.1109/ICCV.1998.710698 10.1145/319463.319464 ballard, 1982, Computer Vision 10.1109/ACV.1996.572043 10.1109/78.790663 bishop, 1995, Neural Networks for Pattern Recognition