Adding spatial distribution clue to aggregated vector in image retrieval

Springer Science and Business Media LLC - Tập 2018 - Trang 1-14 - 2018
Pingping Liu1,2, Zhuang Miao1, Huili Guo1, Yeran Wang1, Ni Ai1
1College of Computer Science and Technology, Jilin University, Changchun, China
2School of Mechanical Science and Engineering, Jilin University, Changchun, China

Tóm tắt

This study proposes a novel algorithm that enhances the distinctiveness of the traditional vector of locally aggregated descriptors (VLAD) using spatial distribution clue of local features. The algorithm introduces a new method to compute the spatial distribution entropy (SDE) of clusters. Unlike conventional methods, this algorithm considers the distribution of full spatial information provided by local feature detectors rather than only utilizing the spatial coordinate statistics. For each cluster, the corresponding spatial distribution is computed using a histogram of spatial locations, scales, and orientations of all local features inside the cluster. Entropy is calculated from the spatial distributions of all clusters of an image to create a distribution function, which is further normalized and concatenated with the VLAD vector to generate the final representation. Image retrieval and classification experiments on public datasets are performed. Experimental results show that the proposed algorithms produce better or comparable retrieval performance than several state-of-the-art algorithms. In addition, we extend our SDE to the convolutional neural network (CNN) feature, which further improves the CNN feature result in image retrieval.

Tài liệu tham khảo

S AWM, M Worring, S Santini, A Gupta, R Jain, Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 22(12), 1349–1380 (2000). https://doi.org/10.1109/34.895972 D Lowe, Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004) Sivic, J. and A. Zisserman (2003). Video Google: A text retrieval approach to object matching in videos. null, IEEE J Yang et al., Linear Spatial Pyramid Matching Using Sparse Coding for Image Classification. Computer Vision and Pattern Recognition, 2009. CVPR 2009 (IEEE Conference on, IEEE, 2009) Wang, J., et al. (2010). Locality-Constrained Linear Coding for Image Classification. Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, IEEE. Liu, L., et al. (2011). In Defense of Soft-Assignment Coding. Computer Vision (ICCV), 2011 IEEE International Conference on, IEEE. Kim, T.-E. and M. H. Kim (2015). Improving the search accuracy of the VLAD through weighted aggregation of local descriptors. J. Vis. Commun. Image Represent. 31, 237–252 H Jegou et al., Aggregating local image descriptors into compact codes. Pattern Analysis and Machine Intelligence, IEEE Transactions on 34(9), 1704–1716 (2012) Perronnin, F., et al. (2010). Large-Scale Image Retrieval with Compressed Fisher Vectors. Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, IEEE. F Perronnin et al., Improving the fisher kernel for large-scale image classification. Computer Vision–ECCV 2010, 143–156 (2010) Peng, X., et al. (2014). Boosting VLAD with supervised dictionary learning and high-order statistics (European Conference on Computer Vision, Springer H Bay, A Ess, T Tuytelaars, L Van Gool, Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008) http://dx.doi.org/10.1016/j.cviu.2007.09.014 K Mikolajczyk, C Schmid, Scale & affine invariant interest point detectors. Int. J. Comput. Vis. 60(1), 63–86 (2004) Ali, N., et al. (2016). Image retrieval by addition of spatial information based on histograms of triangular regions. Computers Electrical Engineering 54, 539–550 O Chum et al., Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval. Computer Vision, 2007 (ICCV 2007. IEEE 11th International Conference on, IEEE, 2007) Philbin, J., et al. (2008). Lost in quantization: Improving particular object retrieval in large scale image databases. Computer Vision and Pattern Recognition, 2008 (CVPR 2008. IEEE Conference on, IEEE) J Philbin et al., Object Retrieval with Large Vocabularies and Fast Spatial Matching. Computer Vision and Pattern Recognition, 2007. CVPR’07 (IEEE Conference on, IEEE, 2007) H Xie et al., Efficient feature detection and effective post-verification for large scale near-duplicate image search. IEEE TRANSACTIONS on multimedia 13(6), 1319–1332 (2011) H Jegou et al., Hamming embedding and weak geometric consistency for large scale image search. Computer Vision–ECCV 2008, 304–317 (2008) J Huang, X Yang, X Fang, W Lin, R Zhang, Integrating visual saliency and consistency for re-ranking image search results. IEEE Transactions on Multimedia 13(4), 653–661 (2011). https://doi.org/10.1109/TMM.2011.2127463 Jégou, H. and A. Zisserman (2014). Triangulation embedding and democratic aggregation for image search. Proceedings of the IEEE conference on computer vision and pattern recognition Arandjelović, R. and A. Zisserman (2012). Three Things Everyone should Know to Improve Object Retrieval. Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE. Andoni, A. and P. Indyk (2006). Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Foundations of Computer Science, 2006. FOCS’06. 47th Annual IEEE Symposium on, IEEE Weiss, Y., et al. (2009). Spectral Hashing. Advances in Neural Information Processing Systems. Jégou, H., et al., (2009). On the burstiness of visual elements. Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, IEEE Delhumeau, J., et al. (2013). Revisiting the VLAD image representation. Proceedings of the 21st ACM international conference on Multimedia, ACM Arandjelovic, R. and A. Zisserman (2013). All about VLAD. Computer Vision and Pattern Recognition CVPR. 2013 IEEE Conference on. Z Liu et al., Making residual vector distribution uniform for distinctive image representation. IEEE Transactions on Circuits & Systems for Video Technology 26(2), 375–384 (2016) Z Liu et al., Fine-residual VLAD for image retrieval. Neurocomputing 173(P3), 1183–1191 (2016) Wang, Y., et al. (2015). Hierarchical multi-VLAD for image retrieval. Image Processing (ICIP), 2015 IEEE International Conference on, IEEE Q Zhou, C Wang, P Liu, Q Li, Y Wang, S Chen, Distribution entropy boosted VLAD for image retrieval. Entropy 18(8), 311 (2016) Mehmood, Z., et al. (2016). A novel image retrieval based on a combination of local and global histograms of visual words. Mathematical Problems in Engineering 2016. vol. 2016, Article ID 8217250, 12 pages, 2016. https://doi.org/10.1155/2016/8217250 Krapac, J., et al. (2011). Modeling spatial layout with fisher vectors for image categorization. Computer Vision (ICCV), 2011 IEEE International Conference on, IEEE Koniusz, P. and K. Mikolajczyk (2011). Spatial coordinate coding to reduce histogram representations, dominant angle and colour pyramid match. Image Processing (ICIP), 2011 18th IEEE International Conference on, IEEE J Sánchez, F Perronnin, TD Campos, Modeling the spatial layout of images beyond spatial pyramids. Pattern Recogn. Lett. 33(16), 2216–2223 (2012) Y Lecun, BE Boser, JS Denker, D Henderson, RE Howard, W Hubbard, LD Jackel, Back propagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989) A Krizhevsky et al., ImageNet classification with deep convolutional neural networks. Commun. ACM 60(2), 2012 (2013) Zeiler, M.D. and R. Fergus (2014). Visualizing and understanding convolutional networks European conference on computer vision, Springer R Arandjelovic et al., NetVLAD: CNN architecture for weakly supervised place recognition. IEEE Transactions on Pattern Analysis & Machine Intelligence PP(99), 1 (2015) Gong, Y., et al. (2014). Multi-scale orderless pooling of deep convolutional activation features. European conference on computer vision, Springer Jégou, H. and O. Chum (2012). Negative evidences and co-occurences in image retrieval: The benefit of PCA and whitening. Computer Vision–ECCV 2012: 774-787 Radenović, F., et al. (2015). Multiple measurements and joint dimensionality reduction for large scale image search with short vectors. Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, ACM T Tuytelaars, K Mikolajczyk, Local invariant feature detectors: a survey. Foundations and Trends in Computer Graphics and Vision 3(3), 177–280 (2008) P Gaussier, JP Cocquerez, Neural Networks for Complex Scene Recognition: Simulation of a Visual System with Several Cortical Areas (International Joint Conference on Neural Networks, 1992) T Lindeberg, J Garding, Shape-adapted smoothing in estimation of 3-D shape cues from affine deformations of local 2-D brightness structure. Image Vis. Comput. 15(6), 415–434 (1997) J Matas et al., Robust wide-baseline stereo from maximally stable extremal regions. Image & Vision Computing 22(10), 761–767 (2004) CE Shannon, W Weaver, N Wiener, The mathematical theory of communication. Phys. Today 3(9), 31–32 (1950) Jégou, H., et al. (2010). Aggregating local descriptors into a compact image representation. Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, IEEE Zhou, R., et al. (2014). Spatial pyramid VLAD. Visual Communications and Image Processing Conference, 2014 IEEE, IEEE A Vedaldi, B Fulkerson, Vlfeat: An Open and Portable Library of Computer Vision Algorithms (International Conference on Multimedea 2010, Firenze, 2010) October M Everingham et al., The Pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)