Beyond the Informedia digital video library: video and audio analysis for remembering conversations
Tóm tắt
The Informedia Project digital video library pioneered the automatic analysis of television broadcast news and its retrieval on demand. Building on that system, we have developed a wearable, personalized Informedia system, which listens to and transcribes the wearer's part of a conversation, recognizes the face of the current dialog partner and remembers his/her voice. The next time the system sees the same person's face and hears the same voice, it can retrieve the audio from the last conversation, replaying in compressed form the names and major issues that were mentioned. All of this happens unobtrusively, somewhat like an intelligent assistant who whispers to you: "That's Bob Jones from Tech Solutions; two weeks ago in London you discussed solar panels". This paper outlines the general system components as well as interface considerations. Initial implementations showed that both face recognition methods and speaker identification technology have serious shortfalls that must be overcome.
Từ khóa
#Software libraries #Digital video broadcasting #Digital audio broadcasting #Face recognition #Speech recognition #Space technology #Global Positioning System #Computer science #TV broadcasting #Automatic speech recognitionTài liệu tham khảo
rhodes, 1996, Rememberance Agent: A continuously running automated information retrieval system, Proc of Pract App of Intelligent Agents and Multi-Agent Tech (PAAM)
10.1353/log.2014.0007
lamming, 1994, Forget-me-not Intimate Computing in Support of Human Memory FRIEND21 Symposium on Next Generation Human Interface
bush, 1945, As We May Think The Atlantic Monthly, 176, 101
gray, 1999, What next? A few remaining problems in Information Technology, Federated Computing Research Conference
arons, 1994, Interactively Skimming Recorded Speech
arons, 1994, Pitch-Based Emphasis Detection for Segmenting Speech Recordings, ICASSP-94, 4, 18
rowley, 0, Face Detection in Visual Scenes, CMU-CS-95–186 Technical report
rowley, 1998, invariant neural network-based face detection, IEEE CVPR
10.1109/CVPR.1997.609351
10.1109/CVPR.1997.609414
0, CHARME
10.1109/ICASSP.1991.150352
0, Xybemaut
leibe, 2000, Toward Spontaneous Interaction with the Perceptive Workbench, IEEE Computer Graphics and Applications
10.1109/ISWC.1998.729528
0, VIA
abowd, 1999, Classroom 2000 An experiment with the instrumentation of a living educational environment IBM Systems Journal
sparacino, 2000, Media in performance Interactive spaces for dance theater circus and museum exhibits IBM Systems Journal
10.1002/0471221635
russell, 1995, Unencumbered Virtual Environments, Proc of IJCAI'95 Workshop on Entertainment and AI/Alife
0, MicroOptical
10.1109/CVPR.1998.698586
seymore, 1998, The 1997 CMU Sphinx-3 English Broadcast News Transcription System, DARPA Workshop on Broadcast News Understanding Systems
0, Virage Corporate Web Site
10.1109/2.745722
10.1109/2.493456
schmidt, 1997, GMM sample statistic log-likelihoods for text-independent speaker recognition, Eurospeech 9, 855
0, Visionics FaceIt Developer Kit