A context-aware semantic modeling framework for efficient image retrieval
Tóm tắt
In recent years, high-level image representation is gaining popularity in image classification and retrieval tasks. This paper proposes an efficient scheme known as semantic context model to derive high-level image descriptors well suited for the retrieval operation. Semantic context model uses an undirected graphical model based formulation which jointly exploits low-level visual features and contextual information for classifying local image blocks into some predefined concept classes. Contextual information involves concept co-occurrences and their spatial correlation statistics. More expressive potential functions are introduced to capture the structural dependencies among various semantic concepts. The proposed framework proceeds in three steps. Initially, optimal values of model parameters that impose spatial consistency of concept labels among local image blocks are learned from the training data. Then, the semantics associated with the constituent blocks of an unseen image are inferred using an improved message-passing algorithm. Finally, a compact but discriminative image signature is derived by integrating the frequency of occurrence of various regional semantics. Experimental results on various benchmark datasets show that semantic context model can effectively resolve local ambiguities and consequently improve concept recognition performance in complex images. Moreover, the retrieval efficiency of the new semantics based image feature is found to be much better than state-of-the-art approaches.
Tài liệu tham khảo
Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of Ninth IEEE international conference on computer vision, vol 2, pp 1470–1477
Duan M, Wu X (2010) Visual polysemy and synonymy: toward near-duplicate image retrieval. Front Electr Electron Eng China 5(4):419–429
Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1–2):177–196
Zhang R, Zhang Z (2007) Effective image retrieval based on hidden concept discovery in image database. IEEE Trans Image Process 16(2):562–572
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Biederman I, Mezzanotte R, Rabinowitz J (1982) Scene perception: detecting and judging objects undergoing relational violations. Cogn Psychol 14(2):143–177
Kumar S, Hebert M (2006) Discriminative random fields. Int J Comput Vis 68(2):179–201
Yu L, Xie J, Chen S (2012) Conditional random field-based image labelling combining features of pixels, segments and regions. IET Comput Vis 6(5):459–467
Vogel J, Schiele B (2007) Semantic modeling of natural scenes for content-based image retrieval. Int J Comput Vis 72(2):133–157
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, vol 1, pp 886–893
Bay, H., Tuytelaars, T., Van Gool, L (2006) Surf: speeded up robust features. In: Proceedings of the 9th European conference on computer vision, pp 404-417
Tola E, Lepetit V, Fua P (2010) Daisy: an efficient dense descriptor applied to wide-baseline stereo. IEEE Trans Pattern Anal Mach Intell 32(5):815–830
Li LJ, Su H, Lim Y, Fei-Fei L (2014) Object bank: an object-level image representation for high-level visual recognition. Int J Comput Vis 107(1):20–39
Torresani L, Szummer M, Fitzgibbon A (2010) Efficient object category recognition using classemes. In: Proceedings of 11th European conference on computer vision. Springer, Berlin, Heidelberg, pp 776–789
Chan A, A., Vasconcelos., N, (2005) Probabilistic kernels for the classification of auto-regressive visual processes. In: Proceedings of IEEE conference on computer vision and pattern recognition, vol 1, pp 846–851
Zhang H, Berg A, Maire M, Malik J (2006) Svm-knn: discriminative nearest neighbor classification for visual category recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2126–2136
Cai D, He X, Han J (2007) Efficient kernel discriminant analysis via spectral regression. In: Proceedings of Seventh IEEE international conference on data mining, pp 427–432
Grauman K, Darrell T (2007) The pyramid match kernel: efficient learning with sets of features. J Mach Learn Res 8:725–760
Bosch A, Zisserman A, Munoz X (2008) Scene classification using a hybrid generative/discriminative approach. IEEE Trans Pattern Anal Mach Intell 30(4):712–727
Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval, pp 119–126
Fei-Fei L, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 524–531
Sivic J, Russell B, Efros A, Zisserman A, Freeman W (2005) Discovering object and their localization in images. In: Proceedings of the tenth IEEE international conference on computer vision, vol 1, pp 370–377
Sudderth E, Torralba A, Freeman W, Willsky A (2005) Learning hierarchical models of scenes, objects and parts. In: Proceedings of the tenth IEEE international conference on computer vision, vol 2, pp 1331–1338
Carneiro G, Chan A, Moreno P, Vasconcelos N (2007) Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans Pattern Anal Mach Intell 29(3):394–410
Rasiwasia N, Vasconcelos N (2012) Holistic context models for visual recognition. IEEE Trans Pattern Anal Mach Intell 34(5):902–917
Bar M (2004) Visual objects in context. Nat Rev Neurosci 5(8):617–629
Bar M, Ullman S (1993) Spatial context in recognition. Perception 25:343–352
Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT Press, Cambridge, p 1280
Lafferty J, McCallum A, Pereira FC (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th international conference on machine learning, pp 282–289
Kohli P, Torr PH (2009) Robust higher order potentials for enforcing label consistency. Int J Comput Vis 82(3):302–324
He X, Zemel RS, Carreira-Perpindn MA (2004) Multiscale conditional random fields for image labeling. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 2, pp 695–702
Krhenbhl P, Koltun V (2012) Efficient inference in fully connected crfs with Gaussian edge potentials. arXiv:1210.5644
Efron B (1975) The efficiency of logistic regression compared to normal discriminant analysis. J Am Stat Assoc 70(352):892–898
Kindermann R, Snell JL (1980) Markov random fields and their applications, vol 1. American Mathematical Society, Providence
Dagli C, Huang TS (2004) A framework for grid-based image retrieval. In: Proceedings of the 17th IEEE international conference on pattern recognition, vol 2, pp 1021–1024
Huiskes MJ, Lew MS (2008) The MIR Flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on multimedia information retrieval, pp 39–43
Bruna J, Mallat S (2013) Invariant scattering convolution networks. IEEE Trans Pattern Anal Mach Intell 35(8):1872–1886
Mallat S (2012) Group invariant scattering. Commun Pure Appl Math 65(10):1331–1398
Andn J, Mallat S (2011) Multiscale scattering for audio classification. In: ISMIR, pp 657–662
Oyallon E, Mallat S, Sifre L (2013) Generic deep networks with wavelet scattering. arXiv:1312.5940v3
Lee TS (1996) Image representation using 2D Gabor wavelets. IEEE Trans Pattern Anal Mach Intell 18(10):959–971
Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classif 10(3):61–74
Wu TF, Lin CJ, Weng RC (2004) Probability estimates for multi-class classification by pairwise coupling. J Mach Learn Res 5:975–1005
Sutton C, McCallum A (2007) Piecewise pseudo likelihood for efficient training of conditional random fields. In: Proceedings of the 24th ACM international conference on machine learning, pp 863–870
Beck A, Ben-Tal A (2006) On the solution of the Tikhonov regularization of the total least squares problem. SIAM J Optim 17(1):98–118
Kelley CT (1999) Iterative methods for optimization. Frontiers in applied mathematics. Siam, Philadelphia, PA
Gill PE, Murray W, Wright MH (1981) Practical optimization, vol 5. Academic press, London
Lempitsky V, Rother C, Roth S, Blake A (2010) Fusion moves for markov random field optimization. IEEE Trans Pattern Anal Mach Intell 32(8):1392–1405
Murphy KP, Weiss Y, Jordan MI (1999) Loopy belief propagation for approximate inference: an empirical study. In: Proceedings of the Fifteenth International conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc, pp 467–475
Murray I, Ghahramani Z (2004) Bayesian learning in undirected graphical models: approximate MCMC algorithms. In: Proceedings of the 20th International conference on uncertainty in artificial intelligence. AUAI Press, pp 392–399
Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, San Francisco, CA
Johnson D, Sinanovic S (2001) Symmetrizing the kullback-leibler distance. http://www-dsp.rice.edu/~dhj/resistor.pdf
Barla A, Odone F, Verri A (2003) Histogram intersection kernel for image classification. In: Proceedings of international conference on image processing, vol 3, pp 513–516
Zobel J, Moffat A, Ramamohanarao K (1998) Inverted files versus signature files for text indexing. ACM Trans Database Syst 23(4):453–490
van Hateren JH, van der Schaaf A (1998) Independent component filters of natural images compared with simple cells in primary visual cortex. Proc R Soc Lond B Biol Sci 265(1394):359–366
Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in Neural Information Processing Systems (NIPS), pp 487–495
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2014) Imagenet large scale visual recognition challenge. Int J Comput Vis, pp 1–42
Kohavi R, Provost F (1998) Glossary of terms. Mach Learn 30(2–3):271–274
Chum O, Philbin J, Zisserman A (2008) Near duplicate image detection: min-Hash and tf-idf weighting. In: Proceedings of British machine vision conference, vol 810, pp 812–815