CoMMA: a framework for integrated multimedia mining using multi-relational associations

Knowledge and Information Systems - Tập 10 - Trang 135-162 - 2005
Ankur M. Teredesai1, Muhammad A. Ahmad1, Juveria Kanodia1, Roger S. Gaborski1
1Department of Computer Science, Rochester Institute of Technology, Rochester, USA

Tóm tắt

Generating captions or annotations automatically for still images is a challenging task. Traditionally, techniques involving higher-level (semantic) object detection and complex feature extraction have been employed for scene understanding. On the basis of this understanding, corresponding text descriptions are generated for a given image. In this paper, we pose the auto-annotation problem as that of multi-relational association rule mining where the relations exist between image-based features, and textual annotations. The central idea is to combine low-level image features such as color, orientation, intensity, etc. and corresponding text annotations to generate association rules across multiple tables using multi-relational association mining. Subsequently, we use these association rules to auto-annotate test images. In this paper we also present a multi-relational extension to the FP-tree algorithm to accomplish the association rule mining task effectively. The motivation for using multi-relational association rule mining for multimedia data mining is to exhibit the potential accorded by multiple descriptions for the same image (such as multiple people labeling the same image differently). Moreover, multi-relational association rule mining can also benefit the auto-annotation process by pruning the number of trivial associations that are generated if text and image features were combined in a single table through a join. In this paper, we discuss these issues and the results of our auto-annotation experiments on different test sets. Another contribution of this paper is highlighting a need to develop robust evaluation metrics for the image annotation task. We propose several applicable scoring techniques and then evaluate the performance of the different algorithms to study the utility of these techniques. A detailed analysis of the datasets used and the performance results are presented to conclude the paper.

Tài liệu tham khảo

Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo A (1996) Fast discovery of association rules. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. AAAI/MIT Press, chapter 12, pp 307–328 Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the ACM Sigmod conference on management of data. Washington, DC, pp 207–216 Barnard K, Duygulu P, Freitas N, Forsyth D, Blei D, Jordan M (2003) Matching words and pictures. J Mach Learn Res 3:1107–1135 Brockhausen P, Haas M, Kietz J, Knobbe A, Rem O, Zucker R, Brandt N (2001) Mining multi-relational data. Technical report, IST Project MiningMart, IST-11993 Carson C, Thomas M, Belongie S, Hellerstein J, Malik J (1999) Blobworld: a system for region-based image indexing and retrieval. In: Proceedings of the 3rd international conference on visual information system. Amsterdam, The Netherlands, pp 509–516 Cheng P, Chien L (2003) Auto-generation of topic hierarchies for web images from users' perspectives, CIKM'03, pp 544–547 Dai J, Lee M, Hsu W (2003) Mining viewpoint patterns in image databases. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining. Washington, DC, USA Džeroski S (1993) Multi relational data mining: an introduction. Sigmod ACM Trans Program Lang Syst 15(5):795–825 Faloutsos C, Pan J, Yang H, Duygulu P (2004) Automatic multimedia cross-modal correlation discovery. In: Proceedings of the 10th ACM SIGKDD conference. Seatle, WA Faloutsos C, Pan J, Yang H, Duygulu P (2004) GCap: graph-based automatic image captioning. In: Proceedings of the 4th international workshop on multimedia data and document engineering (MDDE, 04), in conjunction with computer vision pattern recognition conference (CVPR, 04). Washington, DC Flickner M, Sawhney H, Niblack W, Ashley J, Huang Q, Dom B, Gorkani M, Hafner J, Lee D, Petkovic D, Steele D, Yanker P (1995) Query by image and video content: the QBIC system. IEEE Comput 28/29:23–32 Gaborski R, Vaingankar VS, Canosa RL (2003) Goal directed visual search based on color cues: cooperative effects of top-down & bottom-up visual attention. In: Proceedings of the artificial neural networks in engineering, vol 13. Rolla, Missouri, pp 613–618 Gaborski R, Vaingankar VS, Chaoji V, Teredesai A, Tentler A (2004) VENUS: a system for novelty detection in video streams with learning. In: Proceedings of the 17th international FLAIRS conference. South Beach, FL Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the ACM-SIGMOD. Dallas Hsu W, Lee M, Zhang J (2002) Image mining: trends and developments. J Intell Inform Syst. Special issue on multimedia data mining. Kluwer Academic Itti L, Koch C (2001) Computational modeling of visual attention. Nat Neurosci Rev 2(3):194–203 Jensen V, Soparkar N (2001) Frequent itemset counting across multiple tables. In: Proceedings of PAKDD, pp 49–61 Jeon J, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models, SIGIR'03 Li J, Wang J (2003) Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans Pattern Anal Mach Intell 25(9) Monay F, Gatica-Perez D (2003) On image auto-annotation with latent space models, MM'03, pp 275–278 Ooi B, Tan K, Chua T, Hsu W (1998) Fast image retrieval using color-spatial information. VLDB J 7(2):115–128. Springer Verlag Parkhurst D, Law K, Neibur E (2002) Modeling the role of salience in the allocation of overt visual attention. Vis Res 42(1):107–123 Suetens P, Pascal F, Hanson, Andrew J (1992) Computational strategies for object recognition, ACM Comput Surv 5–62 Su Z, Zhang H, Li S (2001) Extraction of feature subspaces for content based retrieval using relevance feedback, MM'01, pp 98–106 Wang L, Liu L, Khan L (2004) Automatic image annotation and retrieval using subspace clustering algorithm. In: Proceedings of the second ACM international workshop on multimedia databases, ACM-MMDB'04. Arlington, VA, USA Wenyin L, Dumais S, Sun Y, Zhang H, Czerwinski M, Field B (2001) Semi-automatic image annotation. Microsoft research technical report Zaiane O, Han J, Li Z, Chee S, Chiang J (1998) MultiMediaMiner: a system prototype for multimedia data mining. In: Proceedings of the ACM SIGMOD international conference on management of data. pp 581–583 Zhang R, Zhang Z (Mark) (2003) Addressing CBIR efficiency, effectiveness and retrieval subjectivity simultaneously, MIR'03, pp 71–78