A context-aware semantic modeling framework for efficient image retrieval

K. S. Arun1, V. K. Govindan1
1Department of Computer Science and Engineering, National Institute of Technology Calicut, India

Tóm tắt

In recent years, high-level image representation is gaining popularity in image classification and retrieval tasks. This paper proposes an efficient scheme known as semantic context model to derive high-level image descriptors well suited for the retrieval operation. Semantic context model uses an undirected graphical model based formulation which jointly exploits low-level visual features and contextual information for classifying local image blocks into some predefined concept classes. Contextual information involves concept co-occurrences and their spatial correlation statistics. More expressive potential functions are introduced to capture the structural dependencies among various semantic concepts. The proposed framework proceeds in three steps. Initially, optimal values of model parameters that impose spatial consistency of concept labels among local image blocks are learned from the training data. Then, the semantics associated with the constituent blocks of an unseen image are inferred using an improved message-passing algorithm. Finally, a compact but discriminative image signature is derived by integrating the frequency of occurrence of various regional semantics. Experimental results on various benchmark datasets show that semantic context model can effectively resolve local ambiguities and consequently improve concept recognition performance in complex images. Moreover, the retrieval efficiency of the new semantics based image feature is found to be much better than state-of-the-art approaches.

Tài liệu tham khảo

Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of Ninth IEEE international conference on computer vision, vol 2, pp 1470–1477 Duan M, Wu X (2010) Visual polysemy and synonymy: toward near-duplicate image retrieval. Front Electr Electron Eng China 5(4):419–429 Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1–2):177–196 Zhang R, Zhang Z (2007) Effective image retrieval based on hidden concept discovery in image database. IEEE Trans Image Process 16(2):562–572 Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022 Biederman I, Mezzanotte R, Rabinowitz J (1982) Scene perception: detecting and judging objects undergoing relational violations. Cogn Psychol 14(2):143–177 Kumar S, Hebert M (2006) Discriminative random fields. Int J Comput Vis 68(2):179–201 Yu L, Xie J, Chen S (2012) Conditional random field-based image labelling combining features of pixels, segments and regions. IET Comput Vis 6(5):459–467 Vogel J, Schiele B (2007) Semantic modeling of natural scenes for content-based image retrieval. Int J Comput Vis 72(2):133–157 Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110 Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, vol 1, pp 886–893 Bay, H., Tuytelaars, T., Van Gool, L (2006) Surf: speeded up robust features. In: Proceedings of the 9th European conference on computer vision, pp 404-417 Tola E, Lepetit V, Fua P (2010) Daisy: an efficient dense descriptor applied to wide-baseline stereo. IEEE Trans Pattern Anal Mach Intell 32(5):815–830 Li LJ, Su H, Lim Y, Fei-Fei L (2014) Object bank: an object-level image representation for high-level visual recognition. Int J Comput Vis 107(1):20–39 Torresani L, Szummer M, Fitzgibbon A (2010) Efficient object category recognition using classemes. In: Proceedings of 11th European conference on computer vision. Springer, Berlin, Heidelberg, pp 776–789 Chan A, A., Vasconcelos., N, (2005) Probabilistic kernels for the classification of auto-regressive visual processes. In: Proceedings of IEEE conference on computer vision and pattern recognition, vol 1, pp 846–851 Zhang H, Berg A, Maire M, Malik J (2006) Svm-knn: discriminative nearest neighbor classification for visual category recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2126–2136 Cai D, He X, Han J (2007) Efficient kernel discriminant analysis via spectral regression. In: Proceedings of Seventh IEEE international conference on data mining, pp 427–432 Grauman K, Darrell T (2007) The pyramid match kernel: efficient learning with sets of features. J Mach Learn Res 8:725–760 Bosch A, Zisserman A, Munoz X (2008) Scene classification using a hybrid generative/discriminative approach. IEEE Trans Pattern Anal Mach Intell 30(4):712–727 Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval, pp 119–126 Fei-Fei L, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 524–531 Sivic J, Russell B, Efros A, Zisserman A, Freeman W (2005) Discovering object and their localization in images. In: Proceedings of the tenth IEEE international conference on computer vision, vol 1, pp 370–377 Sudderth E, Torralba A, Freeman W, Willsky A (2005) Learning hierarchical models of scenes, objects and parts. In: Proceedings of the tenth IEEE international conference on computer vision, vol 2, pp 1331–1338 Carneiro G, Chan A, Moreno P, Vasconcelos N (2007) Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans Pattern Anal Mach Intell 29(3):394–410 Rasiwasia N, Vasconcelos N (2012) Holistic context models for visual recognition. IEEE Trans Pattern Anal Mach Intell 34(5):902–917 Bar M (2004) Visual objects in context. Nat Rev Neurosci 5(8):617–629 Bar M, Ullman S (1993) Spatial context in recognition. Perception 25:343–352 Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT Press, Cambridge, p 1280 Lafferty J, McCallum A, Pereira FC (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th international conference on machine learning, pp 282–289 Kohli P, Torr PH (2009) Robust higher order potentials for enforcing label consistency. Int J Comput Vis 82(3):302–324 He X, Zemel RS, Carreira-Perpindn MA (2004) Multiscale conditional random fields for image labeling. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 2, pp 695–702 Krhenbhl P, Koltun V (2012) Efficient inference in fully connected crfs with Gaussian edge potentials. arXiv:1210.5644 Efron B (1975) The efficiency of logistic regression compared to normal discriminant analysis. J Am Stat Assoc 70(352):892–898 Kindermann R, Snell JL (1980) Markov random fields and their applications, vol 1. American Mathematical Society, Providence Dagli C, Huang TS (2004) A framework for grid-based image retrieval. In: Proceedings of the 17th IEEE international conference on pattern recognition, vol 2, pp 1021–1024 Huiskes MJ, Lew MS (2008) The MIR Flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on multimedia information retrieval, pp 39–43 Bruna J, Mallat S (2013) Invariant scattering convolution networks. IEEE Trans Pattern Anal Mach Intell 35(8):1872–1886 Mallat S (2012) Group invariant scattering. Commun Pure Appl Math 65(10):1331–1398 Andn J, Mallat S (2011) Multiscale scattering for audio classification. In: ISMIR, pp 657–662 Oyallon E, Mallat S, Sifre L (2013) Generic deep networks with wavelet scattering. arXiv:1312.5940v3 Lee TS (1996) Image representation using 2D Gabor wavelets. IEEE Trans Pattern Anal Mach Intell 18(10):959–971 Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classif 10(3):61–74 Wu TF, Lin CJ, Weng RC (2004) Probability estimates for multi-class classification by pairwise coupling. J Mach Learn Res 5:975–1005 Sutton C, McCallum A (2007) Piecewise pseudo likelihood for efficient training of conditional random fields. In: Proceedings of the 24th ACM international conference on machine learning, pp 863–870 Beck A, Ben-Tal A (2006) On the solution of the Tikhonov regularization of the total least squares problem. SIAM J Optim 17(1):98–118 Kelley CT (1999) Iterative methods for optimization. Frontiers in applied mathematics. Siam, Philadelphia, PA Gill PE, Murray W, Wright MH (1981) Practical optimization, vol 5. Academic press, London Lempitsky V, Rother C, Roth S, Blake A (2010) Fusion moves for markov random field optimization. IEEE Trans Pattern Anal Mach Intell 32(8):1392–1405 Murphy KP, Weiss Y, Jordan MI (1999) Loopy belief propagation for approximate inference: an empirical study. In: Proceedings of the Fifteenth International conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc, pp 467–475 Murray I, Ghahramani Z (2004) Bayesian learning in undirected graphical models: approximate MCMC algorithms. In: Proceedings of the 20th International conference on uncertainty in artificial intelligence. AUAI Press, pp 392–399 Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, San Francisco, CA Johnson D, Sinanovic S (2001) Symmetrizing the kullback-leibler distance. http://www-dsp.rice.edu/~dhj/resistor.pdf Barla A, Odone F, Verri A (2003) Histogram intersection kernel for image classification. In: Proceedings of international conference on image processing, vol 3, pp 513–516 Zobel J, Moffat A, Ramamohanarao K (1998) Inverted files versus signature files for text indexing. ACM Trans Database Syst 23(4):453–490 van Hateren JH, van der Schaaf A (1998) Independent component filters of natural images compared with simple cells in primary visual cortex. Proc R Soc Lond B Biol Sci 265(1394):359–366 Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in Neural Information Processing Systems (NIPS), pp 487–495 Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2014) Imagenet large scale visual recognition challenge. Int J Comput Vis, pp 1–42 Kohavi R, Provost F (1998) Glossary of terms. Mach Learn 30(2–3):271–274 Chum O, Philbin J, Zisserman A (2008) Near duplicate image detection: min-Hash and tf-idf weighting. In: Proceedings of British machine vision conference, vol 810, pp 812–815