Exploiting semantics on external resources to gather visual examples for video retrieval
Tóm tắt
With the huge and ever rising amount of video content available on the Web, there is a need to facilitate video retrieval functionalities on very large collections. Most of the current Web video retrieval systems rely on manual textual annotations to provide keyword-based search interfaces. These systems have to face the problems that users are often reticent to provide annotations, and that the quality of such annotations is questionable in many cases. An alternative commonly used approach is to ask the user for an image example, and exploit the low-level features of the image to find video content whose keyframes are similar to the image. In this case, the main limitation is the so-called semantic gap, which consists of the fact that low-level image features often do not match with the real semantics of the videos. Moreover, this approach may be a burden to the user, as it requires finding and providing the system with relevant visual examples. Aiming to address this limitation, in this paper, we present a hybrid video retrieval technique that automatically obtains visual examples by performing textual searches on external knowledge sources, such as DBpedia, Flickr and Google Images, which have different coverage and structure characteristics. Our approach exploits the semantics underlying the above knowledge sources to address the semantic gap problem. We have conducted evaluations to assess the quality of visual examples retrieved from the above external knowledge sources. The obtained results suggest that the use of external knowledge can provide valid visual examples based on a keyword-based query and, in the case that visual examples are provided explicitly by the user, it can provide visual examples that complement the manually provided ones to improve video search performance.
Tài liệu tham khảo
Aly R, Hauff C, Heeren W, Hiemstra D, de Jong F, Orderlman R, Verschoor R, de Vries A (2007) The Lowlands Team at TRECVID 2007. In: TRECVid’07
Amir A, Berg M, Permuter H (2005) Mutual relevance feedback for multimodal query formulation in video retrieval. In MIR’05. ACM Press, London, pp 17–24
Chang SF, Chen W, Meng H, Sundaram H, Zhong D (1998) A fully automated content based video search engine supporting spatio-temporal queries. IEEE Trans Circuits Syst Video Technol 8(5):602–615
Chatzichristofis S, Boutalis Y (2008) CEDD: color and edge directivity descriptor, 2008. A compact descriptor for image indexing and retrieval. In ICVS’08. Springer, Berlin, pp 312–322
Chatzichristofis SA, Boutalis YS (2008) FCTH: fuzzy color and texture histogram—a low-level feature for accurate image retrieval. In WIAMIS’08. IEEE, New York, pp 191–196
Collomosse JP, Mcneill G, Watts L (2008) Free-hand sketch grouping for video retrieval. In ICPR’08. IEEE, New York, pp 1–4
Etter D (2008) Knowledge based retrieval at TRECVID 2008. In: TRECVid’08
Flickner M, Sawhney H, Niblack W, Ashley J, Huang Q, Dom B, Gorkani M, Hafner J, Lee D, Petkovic D, Steele D, Yanker P (1995) Query by image and video content: the QBIC system. Computer 28(9):23–32
Guy M, Tonkin E (2006) Folksonomies: Tidying up tags? D-Lib Mag 12(1)
Hauff C, Hiemstra D, de Jong F (2008) A survey of pre-retrieval query performance predictors. In: CIKM’08. ACM Press, London, pp 1419–1420
Hauptmann AG, Christel MG (2004) Successful approaches in the TREC video retrieval evaluations. In: MULTIMEDIA’04. ACM Press, New York, pp 668–675
He J, Li M, Zhang HJ, Tong H, Zhang C (2004) Manifold-ranking based image retrieval. In: MULTIMEDIA’04. ACM Press, London, pp 9–16
Jaimes A, Christel M, Gilles S, Ramesh S, Ma WY (2004) Multimedia information retrieval: what is it, and why isn’t anyone using it? In: MIR’04. ACM Press, London, pp 3–8
Kennedy L, Chang SF, Natsev A (2008) Query-adaptive fusion for multimodal search. In: Proceedings of the IEEE, vol 96(4), pp 567–588
Liang Y et al (2008) THU and ICRC at TRECVid 2008. In: TRECVid’08
Liu Z, Gibbon D, Zavesky E, Shahraray B, Haffner P (2006) AT&T research at TRECVid. In: TRECVid’06
Miller GA (1995) WordNet: a lexical database for English. New horizons in commercial and industrial artificial intelligence. Commun ACM 38(11):39–41
Naphade M, Smith JR, Tesic J, Chang JS, Hsu W, Kennedy L, Hauptmann A, Curtis J (2006) Large-scale ontology for multimedia. IEEE MultiMed 13(3):86–91
Olivares X, Ciaramita M, van Zwol R (2008) Boosting image retrieval through aggregating search results based on visual annotations. In: MM’08. ACM Press, London, pp 189–198
Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and TRECVid. In: MIR’06. ACM Press, London, pp 321–330
Smeaton AF, Wilkins P, Worring M, de Rooij O, Chua TS, Luan H (2008) Content-based video retrieval: three example systems from TRECVid. Int J Imaging Syst Technol 18(2–3):195–201
Snoek CGM, Worring M, van Gemert JC, Geusebroek JM, Smeulders AWM (2006) The challenge problem for automated detection of 101 semantic concepts in multimedia. In: MM’06. ACM Press, London, pp 421–430
Ulges A, Koch M, Schulze C, Breuel TM (2008) Learning TRECVID’08 high-level features from YouTube. In: TRECVid’08
Wilkins P, Ferguson P, Smeaton AF (2006) Using score distributions for query-time fusion in multimedia retrieval. In: MIR’06. ACM Press, London, pp 51–60
Xue X et al (2007) Fudan University at TRECVID 2007. In: TRECVid’07
Yang Y, Xu D, Nie F, Luo J, Zhuang Y (2009) Ranking with local regression and global alignment for cross media retrieval. In: MM’09. ACM Press, London, pp 175–184
Yilmaz E, Aslam JA (2006) Estimating average precision with incomplete and imperfect judgments. In: CIKM’06. ACM Press, London, pp 102–111