Modeling focus of attention for meeting indexing based on multiple cues
Tóm tắt
A user's focus of attention plays an important role in human-computer interaction applications, such as a ubiquitous computing environment and intelligent space, where the user's goal and intent have to be continuously monitored. We are interested in modeling people's focus of attention in a meeting situation. We propose to model participants' focus of attention from multiple cues. We have developed a system to estimate participants' focus of attention from gaze directions and sound sources. We employ an omnidirectional camera to simultaneously track participants' faces around a meeting table and use neural networks to estimate their head poses. In addition, we use microphones to detect who is speaking. The system predicts participants' focus of attention from acoustic and visual information separately. The system then combines the output of the audio- and video-based focus of attention predictors. We have evaluated the system using the data from three recorded meetings. The acoustic information has provided 8% relative error reduction on average compared to only using one modality. The focus of attention model can be used as an index for a multimedia meeting record. It can also be used for analyzing a meeting.
Từ khóa
#Indexing #Application software #Collaborative work #Ubiquitous computing #Monitoring #Cameras #Face detection #Neural networks #Microphones #Acoustic signal detectionTài liệu tham khảo
0, The Stuttgart Neural Network Simulator
gross, 2000, toward a multimodal meeting record, Proc IEEE Int Conf Multimedia Expo, 10.1109/ICME.2000.871074
waibel, 1998, meeting browser: tracking and summarizing meetings, Proc Broadcast News Transcription and Understanding Workshop, 281
10.1016/S0277-9536(00)00227-6
perret, 1994, understanding the intentions of others from visual signals: neurophysiological evidence, Cahiers de Psychologie Cognitive, 13, 683
10.1016/S1364-6613(99)01436-9
gopher, 1990, chapter attention, The Blackwell Dictionary of Cognitive Psychology, 23
diebold jr, 1968, chapter anthropology of the comparative psychology of communicative behavior, Animal Communication— Techniques of Study and Results of Research
argyle, 1976, Gaze and Mutual Gaze
gee, 1994, non-intrusive gaze tracking for human–computer interaction, Proc Mechatron Machine Vision Practice, 112
10.1109/IJSIS.1996.565083
10.1109/CVPR.1997.609312
schiele, 1995, gaze tracking based on face-color, Int Workshop Automatic Face and Gesture Recognition, 344
10.1109/34.667881
10.1145/302979.303065
10.1109/34.588027
argyle, 1969, Social Interaction
bett, 2000, multimodal meeting tracker, Proc RIAO 2000 Content-Based Multimedia Inform Access
10.1109/34.58871
10.1145/142750.142977
goodwin, 1981, Conversational Organization Interaction Between Speakers and Hearers
yang, 1999, multimodal people id for a multimedia meeting browser, Proc ACM Multimedia
coen, 1998, design principles for intelligent environments, Proc Intell Environments 1998 AAAI Spring Symp, 37
10.1016/S0149-7634(00)00025-7
mozer, 1998, the neural network house: an environment that adapts to its inhabitants, Proc Intell Environments 1998 AAAI Spring Symp, 110
10.1109/72.661121
10.1109/ICCV.1998.710698
10.1145/319463.319464
ballard, 1982, Computer Vision
10.1109/ACV.1996.572043
10.1109/78.790663
bishop, 1995, Neural Networks for Pattern Recognition