Restricted Boltzmann machine as an aggregation technique for binary descriptors
Tóm tắt
The article presents a novel approach to the challenge of real-time image classification with deep neural networks. The proposed architecture of the neural network exploits computationally efficient local binary descriptors and uses a restricted Boltzmann machine (RBM) as a feature space projection step so that the resulting depth of the deep neural network can be reduced. A contrastive divergence procedure is used both for RBM training and for feature projection. The resulting neural networks exhibit performance close to the current state-of-the-art but are characterized by a small model memory footprint (i.e., number of parameters) and extremely efficient computational complexity (i.e., response time). The low number of parameters makes these architectures applicable in embedded systems with limited memory or reduced computational capabilities.
Tài liệu tham khảo
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: binary robust independent elementary features. In: European Conference on Computer Vision (ECCV), pp. 778–792 (2010)
Leutenegger, S., Chli, M., Siegwart, R.Y.: Brisk: binary robust invariant scalable keypoints. In: International Conference on Computer Vision (ICCV), pp. 2548–2555 (2011)
Alahi, A., Ortiz, R., Vandergheynst, P.: Freak: fast retina keypoint. In: Computer Vision and Pattern Recognition (CVPR), pp. 510–517 (2012)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: Orb: an efficient alternative to sift or surf. In: International Conference on Computer Vision (ICCV), pp. 2564–2571 (2011)
Lowe, D.G.: Object recognition from local scale-invariant features. In: International Conference on Computer Vision (ICCV), vol. 2, pp. 1150–1157 (1999)
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: European Conference on Computer Vision (ECCV), pp. 404–417 (2006)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 886–893 (2005)
Sharif Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 806–813 (2014)
Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: European conference on computer vision (ECCV), pp. 128–142 (2002)
Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: International Conference on Computer Vision (ICCV), vol. 2, pp. 1470–1477 (2003)
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007)
Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: Computer Vision and Pattern Recognition (CVPR), pp. 3304–3311 (2010)
Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007)
Mohedano, E., McGuinness, K., O’Connor, N.E., Salvador, A., Marques, F., Giro-i Nieto, X.: Bags of local convolutional features for scalable instance search. In: International Conference on Multimedia Retrieval (ICMR), pp. 327–331 (2016)
Fischer, A., Igel, C.: An introduction to restricted boltzmann machines. In: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, pp. 14–36 (2012)
Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Computer Vision and Pattern Recognition (CVPR) (2015). arxiv:1409.4842
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Efficient convolutional neural networks for mobile vision applications (2017). URL https://arxiv.org/pdf/1704.04861.pdf
Fischer, A., Igel, C.: An introduction to restricted boltzmann machines. In: Alvarez, L., Mejail, M., Gomez, L., Jacobo, J. (eds.) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, pp. 14–36. Springer, Berlin (2012)
Kulis, B., Grauman, K.: Kernelized locality-sensitive hashing for scalable image search. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2130–2137 (2009). https://doi.org/10.1109/ICCV.2009.5459466
Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013). https://doi.org/10.1007/s11263-013-0620-5
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 1097–1105. Curran Associates Inc, New York (2012)
Hao Wooi Lim’s blog, friday, august 21, table of results for caltech 101 dataset. http://zybler.blogspot.com/2009/08/table-of-results-for-famous-public.html (2009). Accessed 22 Nov 2018
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. CoRR arXiv:1406.4729 (2014)
Github, cnn-benchmarks. https://github.com/jcjohnson/cnn-benchmarks. Accessed 22 Nov 2018
Chatoux, H., Lecellier, F., Fernandez-Maloigne, C.: Comparative study of descriptors with dense key points. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 1988–1993 (2016)
Kornblith, S., Shlens, J., Le, Q.V.: Do better imagenet models transfer better? (2018). URL https://arxiv.org/pdf/1805.08974.pdf
Canziani A. Culurciello E, P.A.: An analysis of deep neural network models for practical applications (2016). arxiv:1605.07678
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems, vol. 25, (2012). https://doi.org/10.1145/3065386
