Exploiting semantics on external resources to gather visual examples for video retrieval

David Vallet1, Iván Cantador1, Joemon M. Jose2
1Universidad Autónoma de Madrid, Madrid, Spain
2University of Glasgow, Glasgow, UK

Tóm tắt

With the huge and ever rising amount of video content available on the Web, there is a need to facilitate video retrieval functionalities on very large collections. Most of the current Web video retrieval systems rely on manual textual annotations to provide keyword-based search interfaces. These systems have to face the problems that users are often reticent to provide annotations, and that the quality of such annotations is questionable in many cases. An alternative commonly used approach is to ask the user for an image example, and exploit the low-level features of the image to find video content whose keyframes are similar to the image. In this case, the main limitation is the so-called semantic gap, which consists of the fact that low-level image features often do not match with the real semantics of the videos. Moreover, this approach may be a burden to the user, as it requires finding and providing the system with relevant visual examples. Aiming to address this limitation, in this paper, we present a hybrid video retrieval technique that automatically obtains visual examples by performing textual searches on external knowledge sources, such as DBpedia, Flickr and Google Images, which have different coverage and structure characteristics. Our approach exploits the semantics underlying the above knowledge sources to address the semantic gap problem. We have conducted evaluations to assess the quality of visual examples retrieved from the above external knowledge sources. The obtained results suggest that the use of external knowledge can provide valid visual examples based on a keyword-based query and, in the case that visual examples are provided explicitly by the user, it can provide visual examples that complement the manually provided ones to improve video search performance.

Tài liệu tham khảo

Aly R, Hauff C, Heeren W, Hiemstra D, de Jong F, Orderlman R, Verschoor R, de Vries A (2007) The Lowlands Team at TRECVID 2007. In: TRECVid’07 Amir A, Berg M, Permuter H (2005) Mutual relevance feedback for multimodal query formulation in video retrieval. In MIR’05. ACM Press, London, pp 17–24 Chang SF, Chen W, Meng H, Sundaram H, Zhong D (1998) A fully automated content based video search engine supporting spatio-temporal queries. IEEE Trans Circuits Syst Video Technol 8(5):602–615 Chatzichristofis S, Boutalis Y (2008) CEDD: color and edge directivity descriptor, 2008. A compact descriptor for image indexing and retrieval. In ICVS’08. Springer, Berlin, pp 312–322 Chatzichristofis SA, Boutalis YS (2008) FCTH: fuzzy color and texture histogram—a low-level feature for accurate image retrieval. In WIAMIS’08. IEEE, New York, pp 191–196 Collomosse JP, Mcneill G, Watts L (2008) Free-hand sketch grouping for video retrieval. In ICPR’08. IEEE, New York, pp 1–4 Etter D (2008) Knowledge based retrieval at TRECVID 2008. In: TRECVid’08 Flickner M, Sawhney H, Niblack W, Ashley J, Huang Q, Dom B, Gorkani M, Hafner J, Lee D, Petkovic D, Steele D, Yanker P (1995) Query by image and video content: the QBIC system. Computer 28(9):23–32 Guy M, Tonkin E (2006) Folksonomies: Tidying up tags? D-Lib Mag 12(1) Hauff C, Hiemstra D, de Jong F (2008) A survey of pre-retrieval query performance predictors. In: CIKM’08. ACM Press, London, pp 1419–1420 Hauptmann AG, Christel MG (2004) Successful approaches in the TREC video retrieval evaluations. In: MULTIMEDIA’04. ACM Press, New York, pp 668–675 He J, Li M, Zhang HJ, Tong H, Zhang C (2004) Manifold-ranking based image retrieval. In: MULTIMEDIA’04. ACM Press, London, pp 9–16 Jaimes A, Christel M, Gilles S, Ramesh S, Ma WY (2004) Multimedia information retrieval: what is it, and why isn’t anyone using it? In: MIR’04. ACM Press, London, pp 3–8 Kennedy L, Chang SF, Natsev A (2008) Query-adaptive fusion for multimodal search. In: Proceedings of the IEEE, vol 96(4), pp 567–588 Liang Y et al (2008) THU and ICRC at TRECVid 2008. In: TRECVid’08 Liu Z, Gibbon D, Zavesky E, Shahraray B, Haffner P (2006) AT&T research at TRECVid. In: TRECVid’06 Miller GA (1995) WordNet: a lexical database for English. New horizons in commercial and industrial artificial intelligence. Commun ACM 38(11):39–41 Naphade M, Smith JR, Tesic J, Chang JS, Hsu W, Kennedy L, Hauptmann A, Curtis J (2006) Large-scale ontology for multimedia. IEEE MultiMed 13(3):86–91 Olivares X, Ciaramita M, van Zwol R (2008) Boosting image retrieval through aggregating search results based on visual annotations. In: MM’08. ACM Press, London, pp 189–198 Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and TRECVid. In: MIR’06. ACM Press, London, pp 321–330 Smeaton AF, Wilkins P, Worring M, de Rooij O, Chua TS, Luan H (2008) Content-based video retrieval: three example systems from TRECVid. Int J Imaging Syst Technol 18(2–3):195–201 Snoek CGM, Worring M, van Gemert JC, Geusebroek JM, Smeulders AWM (2006) The challenge problem for automated detection of 101 semantic concepts in multimedia. In: MM’06. ACM Press, London, pp 421–430 Ulges A, Koch M, Schulze C, Breuel TM (2008) Learning TRECVID’08 high-level features from YouTube. In: TRECVid’08 Wilkins P, Ferguson P, Smeaton AF (2006) Using score distributions for query-time fusion in multimedia retrieval. In: MIR’06. ACM Press, London, pp 51–60 Xue X et al (2007) Fudan University at TRECVID 2007. In: TRECVid’07 Yang Y, Xu D, Nie F, Luo J, Zhuang Y (2009) Ranking with local regression and global alignment for cross media retrieval. In: MM’09. ACM Press, London, pp 175–184 Yilmaz E, Aslam JA (2006) Estimating average precision with incomplete and imperfect judgments. In: CIKM’06. ACM Press, London, pp 102–111