Beyond the Informedia digital video library: video and audio analysis for remembering conversations

A.G. Hauptmann1, Wei-Hao Lin1
1School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA

Tóm tắt

The Informedia Project digital video library pioneered the automatic analysis of television broadcast news and its retrieval on demand. Building on that system, we have developed a wearable, personalized Informedia system, which listens to and transcribes the wearer's part of a conversation, recognizes the face of the current dialog partner and remembers his/her voice. The next time the system sees the same person's face and hears the same voice, it can retrieve the audio from the last conversation, replaying in compressed form the names and major issues that were mentioned. All of this happens unobtrusively, somewhat like an intelligent assistant who whispers to you: "That's Bob Jones from Tech Solutions; two weeks ago in London you discussed solar panels". This paper outlines the general system components as well as interface considerations. Initial implementations showed that both face recognition methods and speaker identification technology have serious shortfalls that must be overcome.

Từ khóa

#Software libraries #Digital video broadcasting #Digital audio broadcasting #Face recognition #Speech recognition #Space technology #Global Positioning System #Computer science #TV broadcasting #Automatic speech recognition

Tài liệu tham khảo

rhodes, 1996, Rememberance Agent: A continuously running automated information retrieval system, Proc of Pract App of Intelligent Agents and Multi-Agent Tech (PAAM) 10.1353/log.2014.0007 lamming, 1994, Forget-me-not Intimate Computing in Support of Human Memory FRIEND21 Symposium on Next Generation Human Interface bush, 1945, As We May Think The Atlantic Monthly, 176, 101 gray, 1999, What next? A few remaining problems in Information Technology, Federated Computing Research Conference arons, 1994, Interactively Skimming Recorded Speech arons, 1994, Pitch-Based Emphasis Detection for Segmenting Speech Recordings, ICASSP-94, 4, 18 rowley, 0, Face Detection in Visual Scenes, CMU-CS-95–186 Technical report rowley, 1998, invariant neural network-based face detection, IEEE CVPR 10.1109/CVPR.1997.609351 10.1109/CVPR.1997.609414 0, CHARME 10.1109/ICASSP.1991.150352 0, Xybemaut leibe, 2000, Toward Spontaneous Interaction with the Perceptive Workbench, IEEE Computer Graphics and Applications 10.1109/ISWC.1998.729528 0, VIA abowd, 1999, Classroom 2000 An experiment with the instrumentation of a living educational environment IBM Systems Journal sparacino, 2000, Media in performance Interactive spaces for dance theater circus and museum exhibits IBM Systems Journal 10.1002/0471221635 russell, 1995, Unencumbered Virtual Environments, Proc of IJCAI'95 Workshop on Entertainment and AI/Alife 0, MicroOptical 10.1109/CVPR.1998.698586 seymore, 1998, The 1997 CMU Sphinx-3 English Broadcast News Transcription System, DARPA Workshop on Broadcast News Understanding Systems 0, Virage Corporate Web Site 10.1109/2.745722 10.1109/2.493456 schmidt, 1997, GMM sample statistic log-likelihoods for text-independent speaker recognition, Eurospeech 9, 855 0, Visionics FaceIt Developer Kit