Hierarchical Gaussian descriptor based on local pooling for action recognition

Machine Vision and Applications - Tập 30 Số 2 - Trang 321-343 - 2019
X. Nguyen1, Abdel‐Illah Mouaddib1, Thanh Phương Nguyễn2,3
1CNRS, GREYC, UMR 6072, Université de Caen Basse-Normandie, Caen, France
2CNRS, ENSAM, LSIS, UMR 7296, Aix Marseille Université, Marseille, France
3CNRS, LSIS, UMR 7296, Université de Toulon, La Garde, France

Tóm tắt

Từ khóa


Tài liệu tham khảo

Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns: application to face recognition. TPAMI 28(12), 2037–2041 (2006)

Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Geometric means in a novel vector space structure on symmetric positive definite matrices. SIAM J. Matrix Anal. Appl. 29(1), 328–347 (2007)

Bilinski, P., Bremond, F.: Video covariance matrix logarithm for human action recognition in videos. In: IJCAI, pp. 2140–2147 (2015)

Boureau, Y.L., Roux, N.L., Bach, F., Ponce, J., LeCun, Y.: Ask the locals: multi-way local pooling for image recognition. In: ICCV, pp. 2651–2658 (2011)

Cao, Z., Simon, T., Wei, S., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR, pp. 1302–1310 (2017)

Cavazza, J., Zunino, A., Biagio, M.S., Murino, V.: Kernelized covariance for action recognition. In: ICPR, pp. 408–413 (2016)

Chen, C., Jafari, R., Kehtarnavaz, N.: Action recognition from depth sequences using depth motion maps-based local binary patterns. In: WACV, pp. 1092–1099 (2015)

Chen, C., Liu, K., Kehtarnavaz, N.: Real-time human action recognition based on depth motion maps. J. Real Time Image Process. 12(1), 155–163 (2016)

Cirujeda, P., Binefa, X.: 4DCov: a nested covariance descriptor of spatio-temporal features for gesture recognition in depth sequences. In: 3DV, vol. 1, pp. 657–664 (2014)

Coates, A., Ng, A.Y.: The importance of encoding versus training with sparse coding and vector quantization. In: ICML, pp. 921–928 (2011)

Davis, L.S.: Covariance discriminative learning: a natural and efficient approach to image set classification. In: CVPR, pp. 2496–2503 (2012)

Devanne, M., Wannous, H., Berretti, S., Pala, P., Daoudi, M., Bimbo, A.D.: 3-d human action recognition by shape analysis of motion trajectories on Riemannian manifold. IEEE Trans. Cybern. 45(7), 1340–1352 (2015)

Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)

Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: CVPR, pp. 1110–1118 (2015)

Evangelidis, G., Singh, G., Horaud, R.: Skeletal quads: human action recognition using joint quadruples. In: ICPR, pp. 4513–4518 (2014)

Fan, K.C., Hung, T.Y.: A novel local pattern descriptor—local vector pattern in high-order derivative space for face recognition. IEEE Trans. Image Process. 23, 2877–2891 (2014)

Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)

Gao, Z., Zhang, H., Xu, G., Xue, Y.: Multi-perspective and multi-modality joint representation and recognition model for 3D action recognition. Neurocomputing 151, 554–564 (2015)

Gong, L., Wang, T., Liu, F.: Shape of Gaussians as feature descriptors. In: CVPR, pp. 2366–2371 (2009)

Gowayyed, M.A., Torki, M., Hussein, M.E., El-Saban, M.: Histogram of oriented displacements (HOD): describing trajectories of human joints for action recognition. In: IJCAI, pp. 1351–1357 (2013)

Guo, K., Ishwar, P., Konrad, J.: Action recognition from video using feature covariance matrices. IEEE Trans. Image Process. 22(6), 2479–2494 (2013)

Harandi, M.T., Salzmann, M., Hartley, R.: From manifold to manifold: geometry-aware dimensionality reduction for SPD matrices. In: ECCV, pp. 17–32 (2014)

Harandi, M.T., Sanderson, C., Sanin, A., Lovell, B.C.: Spatio-temporal covariance descriptors for action and gesture recognition. In: WACV, pp. 103–110 (2013)

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

Holte, M.B., Moeslund, T.B., Fihl, P.: View-invariant gesture recognition using 3D optical flow and harmonic motion context. CVIU 114(12), 1353–1361 (2010)

Huang, Z., Gool, L.V.: A Riemannian network for SPD matrix learning. In: AAAI, pp. 2036–2042 (2017)

Huang, Z., Wan, C., Probst, T., Gool, L.V.: Deep learning on Lie groups for skeleton-based action recognition. In: CVPR (2017)

Huang, Z., Wu, J., Gool, L.V.: Building deep networks on Grassmann manifolds. In: AAAI (2018)

Hussein, M.E., Torki, M., Gowayyed, M.A., El-Saban, M.: Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. In: IJCAI, pp. 2466–2472 (2013)

Jarrett, K., Kavukcuoglu, K., Ranzato, M., LeCun, Y.: What is the best multi-stage architecture for object recognition? In: ICCV, pp. 2146–2153 (2009)

Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106(4), 620–630 (1957)

Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: CVPR, pp. 3304–3311 (2010)

Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: BMVC, pp. 1–10 (2008)

Kurakin, A., Zhang, Z., Liu, Z.: A real time system for dynamic hand gesture recognition with a depth sensor. In: EUSIPCO, pp. 1975–1979 (2012)

Lee, H., Grosse, R., Ranganath, R., Ng, A.Y.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: ICML, pp. 609–616 (2009)

Li, P., Wang, Q.: Local log-Euclidean covariance matrix (L2ECM) for image representation and its applications. In: ECCV, pp. 469–482 (2012)

Li, P., Wang, Q., Zeng, H., Zhang, L.: Local log-Euclidean multivariate Gaussian descriptor and its application to image classification. TPAMI 39(4), 803–817 (2017)

Li, P., Zeng, H., Wang, Q., Shiu, S.C.K., Zhang, L.: High-order local pooling and encoding Gaussians over a dictionary of Gaussians. IEEE Trans. Image Process. 26(7), 3372–3384 (2017)

Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: CVPRW, pp. 9–14 (2010)

Lin, Y.C., Hu, M.C., Cheng, W.H., Hsieh, Y.H., Chen, H.M.: Human action recognition and retrieval using sole depth information. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 1053–1056 (2012)

Liu, A., Nie, W., Su, Y., Ma, L., Hao, T., Yang, Z.: Coupled hidden conditional random fields for RGB-D human action recognition. Signal Process. 112(C), 74–82 (2015)

Liu, C., Hu, Y., Li, Y., Song, S., Liu, J.: PKU-MMD: a large scale benchmark for continuous multi-modal human action understanding. CoRR (2017). arXiv:1703.07475

Liu, J., Wang, G., Hu, P., Duan, L.Y., Kot, A.C.: Global context-aware attention LSTM networks for 3D action recognition. In: CVPR, pp. 3671–3680 (2017)

Liu, L., Shao, L.: Learning discriminative representations from RGB-D video data. In: IJCAI, pp. 1493–1500 (2013)

Liu, M., Liu, H., Chen, C.: 3D action recognition using multi-scale energy-based global ternary image. IEEE Trans. Circuits Syst. Video Technol. 28(8), 1824–1838 (2018)

Lovrić, M., Min-Oo, M., Ruh, E.A.: Multivariate normal distributions parametrized as a Riemannian symmetric space. J. Multivar. Anal. 74(1), 36–48 (2000)

Luo, C., Ma, C., Wang, C., Wang, Y.: Learning discriminative activated simplices for action recognition. In: AAAI, pp. 4211–4217 (2017)

Luo, J., Wang, W., Qi, H.: Group sparsity and geometry constrained dictionary learning for action recognition from depth maps. In: ICCV, pp. 1809–1816 (2013)

Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online dictionary learning for sparse coding. In: ICML, pp. 689–696 (2009)

Mairal, J., Bach, F., Ponce, J., Sapiro, G., Jenatton, R., Obozinski, G.: SPAMS: SPArse modeling software, v2.4 (2014). http://spams-devel.gforge.inria.fr/downloads.html

Matsukawa, T., Okabe, T., Suzuki, E., Sato, Y.: Hierarchical Gaussian descriptor for person re-identification. In: CVPR, pp. 1363–1372 (2016)

Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. TPAMI 27(10), 1615–1630 (2005)

Müller, M., Röder, T., Clausen, M., Eberhardt, B., Krüger, B., Weber, A.: Documentation Mocap Database HDM05. Technical Report CG-2007-2, Universität Bonn (2007)

Nguyen, X., Mouaddib, A.I., Nguyen, T., Jeanpierre, L.: Action recognition in depth videos using hierarchical Gaussian descriptor. Multimedia Tools Appl. 77(16), 21617–21652 (2018)

Ojala, T., Pietikainen, M., Harwood, D.: Performance evaluation of texture measures with classification based on Kullback discrimination of distributions. In: Proceedings of the 12th IAPR International Conference on Pattern Recognition, vol. 1, pp. 582–585 (1994)

Oneata, D., Verbeek, J., Schmid, C.: Action and event recognition with Fisher vectors on a compact feature set. In: ICCV, pp. 1817–1824 (2013)

Oreifej, O., Liu, Z.: HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences. In: CVPR, pp. 716–723 (2013)

Pang, Y., Yuan, Y., Li, X.: Gabor-based region covariance matrices for face recognition. IEEE Trans. Circuits Syst. Video Technol. 18(7), 989–993 (2008)

Rahmani, H., Mian, A.: 3D action recognition from novel viewpoints. In: CVPR, pp. 1506–1515 (2016)

Sanchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the Fisher vector: theory and practice. IJCV 105(3), 222–245 (2013)

Seidenari, L., Varano, V., Berretti, S., Del Bimbo, A., Pala, P.: Recognizing actions from depth cameras as weakly aligned multi-part bag-of-poses. In: CVPRW, pp. 479–485 (2013)

Serra, G., Grana, C., Manfredi, M., Cucchiara, R.: GOLD: Gaussians of local descriptors for image representation. CVIU 134, 22–32 (2015)

Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: CVPR, pp. 1010–1019 (2016)

Shi, L., Zhang, Y., Cheng, J., Lu, H.: Adaptive spectral graph convolutional networks for skeleton-based action recognition. CoRR (2018). arXiv:1805.07694

Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR, pp. 1297–1304 (2011)

Tang, S., Wang, X., Lv, X., Han, T.X., Keller, J., He, Z., Skubic, M., Lao, S.: Histogram of oriented normal vectors for object recognition with a depth sensor. In: ACCV, pp. 525–538 (2013)

Tuzel, O., Porikli, F., Meer, P.: Region covariance: a fast descriptor for detection and classification. ECCV, Part II, pp. 589–600 (2006)

Tuzel, O., Porikli, F., Meer, P.: Pedestrian detection via classification on Riemannian manifolds. TPAMI 30(10), 1713–1727 (2008)

Vedaldi, A., Fulkerson, B.: Vlfeat: an open and portable library of computer vision algorithms. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 1469–1472 (2010)

Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3D skeletons as points in a Lie group. In: CVPR, pp. 588–595 (2014)

Wang, C., Flynn, J., Wang, Y., Yuille, A.L.: Recognizing actions in 3D using action-snippets and activated simplices. In: AAAI, pp. 3604–3610 (2016)

Wang, C., Wang, Y., Yuille, A.L.: An approach to pose-based action recognition. In: CVPR, pp. 915–922 (2013)

Wang, C., Wang, Y., Yuille, A.L.: Mining 3D key-pose-motifs for action recognition. In: CVPR, pp. 2639–2647 (2016)

Wang, J., Liu, Z., Chorowski, J., Chen, Z., Wu, Y.: Robust 3D action recognition with random occupancy patterns. In: ECCV, pp. 872–885 (2012)

Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: CVPR, pp. 1290–1297 (2012)

Wang, L., Zhang, J., Zhou, L., Tang, C., Li, W.: Beyond covariance: feature representation with nonlinear kernel matrices. In: ICCV, pp. 4570–4578 (2015)

Wang, P., Li, W., Gao, Z., Zhang, J., Tang, C., Ogunbona, P.O.: Action recognition from depth maps using deep convolutional neural networks. IEEE Trans. Hum. Mach. Syst. 46(4), 498–509 (2016)

Wang, Q., Li, P., Zhang, L., Zuo, W.: Towards effective codebookless model for image classification. Pattern Recognit. 59(C), 63–71 (2016)

Wright, J., Ma, Y., Mairal, J., Sapiro, G., Huang, T.S., Yan, S.: Sparse representation for computer vision and pattern recognition. Proc. IEEE 98(6), 1031–1044 (2010)

Xia, L., Aggarwal, J.K.: Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In: CVPR, pp. 2834–2841 (2013)

Xia, L., Chen, C.C., Aggarwal, J.K.: View invariant human action recognition using histograms of 3D joints. In: CVPRW, pp. 20–27 (2012)

Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI (2018)

Yang, X., Tian, Y.: Super normal vector for activity recognition using depth sequences. In: CVPR, pp. 804–811 (2014)

Yang, X., Tian, Y.L.: EigenJoints-based action recognition using Naive–Bayes-nearest-neighbor. In: CVPRW, pp. 14–19 (2012)

Yang, X., Zhang, C., Tian, Y.: Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 1057–1060 (2012)

Yi, Y., Wang, H.: Motion keypoint trajectory and covariance descriptor for human action recognition. Vis. Comput. 34(3), 391–403 (2018)

Yu, M., Liu, L., Shao, L.: Structure-preserving binary representations for RGB-D action recognition. TPAMI 38(8), 1651–1664 (2016)

Yuan, C., Hu, W., Li, X., Maybank, S., Luo, G.: Human action recognition under log-Euclidean Riemannian metric. In: ACCV, pp. 343–353 (2010)

Zanfir, M., Leordeanu, M., Sminchisescu, C.: The moving pose: an efficient 3D kinematics descriptor for low-latency action recognition and detection. In: ICCV, pp. 2752–2759 (2013)

Zhang, C., Tian, Y.: Histogram of 3D facets. CVIU 139(C), 29–39 (2015)

Zhao, G., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. TPAMI 29(6), 915–928 (2007)

Zhou, X., Yu, K., Zhang, T., Huang, T.S.: Image classification using super-vector coding of local image descriptors. In: ECCV, pp. 141–154 (2010)