Multi-modal extraction of highlights from TV Formula 1 programs

M. Petkovic1, V. Mihajlovic2, W. Jonker1, S. Djordjevic-Kajan2
1Computer Science Department, University of Twente, Enschede, Netherlands
2Computer Science Department, University of Nis, Nis, Yugoslavia

Tóm tắt

As amounts of publicly available video data grow, the need to automatically infer semantics from raw video data becomes significant. In this paper, we focus on the use of dynamic Bayesian networks (DBN) for that purpose, and demonstrate how they can be effectively applied for fusing the evidence obtained from different media information sources. The approach is validated in the particular domain of Formula I race videos. For that specific domain we introduce a robust audiovisual feature extraction scheme and a text recognition and detection method. Based on numerous experiments performed with DBN, we give some recommendations with respect to the modeling of temporal and atemporal dependences within the network. Finally, we present the experimental results for the detection of excited speech and the extraction of highlights, as well as the advantageous query capabilities of our system.

Từ khóa

#TV #Speech #Data mining #Bayesian methods #Robustness #Event detection #Feature extraction #Computer science #Text recognition #Cepstral analysis

Tài liệu tham khảo

10.1109/ICIP.2001.958127 syeda-mahmood, 2000, Detecting topical events in digital video, Proc of ACM Multimedia, 85 10.1109/EVENT.2001.938869 boyen, 1998, Tractable inference for complex stochastic processes, Proceedings of the 14th conference on Uncertainty in Artificial Intelligence mihajlovic, 2001, Automatic Annotation of Formula 1 Races for Content-Based Video Retrieval