Spatiotemporal wavelet correlogram for human action recognition

Hamid Abrishami Moghaddam1, Amin Zare2
1Faculty of Electrical and Computer Engineering, K. N. Toosi University of Technology, Tehran, Iran
2Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran

Tóm tắt

In this paper, we present a spatiotemporal wavelet correlogram (STWC) as a new feature for human action recognition (HAR) in videos. The proposed feature benefits from a different approach with respect to bag of visual words, interest point detection and descriptor representation method. The new approach requires neither motion estimation (tracking) nor background/foreground subtraction. STWC is generated more efficiently compared to the state-of-the-art HAR methods and achieves comparable results. STWC utilizes the multi-scale, multi-resolution property of wavelet transform and considers the correlation of wavelet coefficients. It is generated by computing spatiotemporal correlogram of quantized wavelet coefficients. These coefficients are computed using 3D wavelet decomposition and a simple quantization method. Based on the present findings, recommendations are made for the selection of the richest wavelet subbands to compute STWC.

Tài liệu tham khảo

Ben Mabrouk A, Zagrouba E (2018) Abnormal behavior recognition for intelligent video surveillance systems: a review. Expert Syst Appl 91:480–491 Mühling M, Meister M, Korfhage N et al (2018) Content-based video retrieval in historical collections of the German broadcasting archive. Int J Digit Libr. https://doi.org/10.1007/s00799-018-0236-z Deng M, Wang C, Cheng F, Zeng W (2017) Fusion of spatial-temporal and kinematic features for gait recognition with deterministic learning. Pattern Recognit 67:186–200 Jiang Y, Wang J, Liang Y, Xia J (2018) Combining static and dynamic features for real-time moving pedestrian detection. Multimed Tools Appl. https://doi.org/10.1007/s11042-018-6057-7 Ullah MM, Laptev I (2012) Actlets: a novel local representation for human action recognition in video. In: 19th IEEE international conference on image processing (ICIP). IEEE, pp 777–780 Zhou Q, Wang G (2012) Atomic action features: a new feature for action recognition. In: Computer vision—ECCV. Workshops and demonstrations lecture notes in computer science. pp 291–300 Wang L, Li R, Fang Y (2016) Gradient-layer feature transform for action detection and recognition. J Vis Commun Image Represent Part A 40:159–167. https://doi.org/10.1016/j.jvcir.2016.06.023 Nasiri JA, Moghadam Charkari N, Mozafari K (2014) Energy-based model of least squares twin support vector machines for human action recognition. Sig Process 104:248–257 Lu G, Kudo M (2014) Learning action patterns in difference images for efficient action recognition. Neurocomputing 123:328–336 Carmona JM, Climent J (2018) Human action recognition by means of subtensor projections and dense trajectories. Pattern Recognit 81:443–455. https://doi.org/10.1016/j.patcog.2018.04.015 Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE international conference computer vision and pattern recognition, pp 1–8 Tran D, Bourdev L, Fergus R, et al (2016) Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497 Baccouche M, Mamalet F, Wolf C et al (2011) Sequential deep learning for human action recognition. In: Salah AA, Lepri B (eds) Human behavior understanding. Springer, Berlin, pp 29–39 Moghaddam HA, Khajoie TT, Rouhi AH, Tarzjan MS (2005) Wavelet correlogram: a new approach for image indexing and retrieval. Pattern Recognit 38:2506–2518. https://doi.org/10.1016/j.patcog.2005.05.010 Tripathi RK, Jalal AS, Agrawal SC (2018) Suspicious human activity recognition: a review. Artif Intell Rev 50:283–339. https://doi.org/10.1007/s10462-017-9545-7 Zhu F, Shao L, Xie J, Fang Y (2016) From handcrafted to learned representations for human action recognition: a survey. Image Vis Comput Part 2 55:42–52. https://doi.org/10.1016/j.imavis.2016.06.007 Natarajan P, Singh VK, Nevatia R (2010) Learning 3D action models from a few 2D videos for view invariant action recognition. In: Computer vision and pattern recognition (CVPR). IEEE, pp 2006–2013 Slama R, Wannous H, Daoudi M, Srivastava A (2015) Accurate 3D action recognition using learning on the Grassmann manifold. Pattern Recognit 4:556–567 Fathi A, Mori G (2008) Action recognition by learning mid-level motion features. In: Proceedings of IEEE conference computer vision and pattern recognition, pp 1–8 Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision. In: IJCAI’81 proceedings of the 7th international joint conference on artificial intelligence. Morgan Kaufmann Publishers Inc, pp 674–679 Chaudhry R, Ravichandran A, Hager G, Vidal R (2009) Histograms of oriented optical flow and Binet–Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1932–1939 Matikainen P, Hebert M, Sukthankar R (2009) Trajectons: action recognition through the motion analysis of tracked features. In: 2009 IEEE 12th international conference on computer vision workshops, ICCV workshops, pp 514–521 Wang H, Klaser A, Schmid C, Liu C-L (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103:60–79 Harris C, Stephens M (1988) A combined corner and edge detector. In: Proceedings of fourth Alvey vision conference, pp 147–151 Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos in the wild. In: IEEE conference on computer vision and pattern recognition, pp 1996–2003 Dalal N, Triggs B (2005) Histograms of oriented gradients for human Detection. In: Computer vision and pattern recognition (CVPR), pp 886–893 Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Comput Vis 60:91–110 Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110:346–359. https://doi.org/10.1016/j.cviu.2007.09.014 Laptev I (2003) Space-time interest points. Comput Vis 64:107–123 Willems G, Tuytelaars T, Gool L-V (2008) An efficient dense and scale-invariant spatio-temporal interest point detector. In: 10th European conference on computer vision. Springer, pp 650–663 Klaser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3D-gradients. In: Proceedings of British machine vision conference Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780 Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 4305–4314 Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Proceedings of the 27th international conference on neural information processing systems. MIT Press, Cambridge, pp 568–576 Nguyen T-V, Song Z, Yan S (2015) STAP: spatial-temporal attention-aware pooling for action recognition. IEEE Trans Circuits Syst Video Technol 25:77–86 Bobick A, Davis J (2001) The recognition of human movement using temporal templates. Pattern Anal Mach Intell 23:257–267 Hu M (1962) Visual pattern recognition by moment invariants. IRE Trans Inf Theory 8:179–187 Nazir S, Yousaf MH, Velastin SA (2018) Evaluating a bag-of-visual features approach using spatio-temporal features for action recognition. Comput Electr Eng. https://doi.org/10.1016/j.compeleceng.2018.01.037 Castro-Muñoz G, Martínez-Carballido J, Rosas-Romero R (2015) A human action recognition approach with a novel reduced feature set based on the natural domain knowledge of the human figure. Signal Process Image Commun 30:190–205 Huang J, Kumar SR, Mitra M et al (1997) Image indexing using color correlograms. Comput Vis Pattern Recognit. https://doi.org/10.1109/cvpr.1997.609412 Mallat S (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Pattern Anal Mach Intell 11:674–693 Rahman Ahad MA, Islam MN, Jahan I (2016) Action recognition based on binary patterns of action-history and histogram of oriented gradient. J Multimodal User Interfaces 10:335–344. https://doi.org/10.1007/s12193-016-0229-4 Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:1–27 Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proceedings of IEEE international conference pattern recognition pp 32–36 Ji S, Yang M, Yu K et al (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35:221–231. https://doi.org/10.1109/TPAMI.2012.59 Charalampous K, Gasteratos A (2016) On-line deep learning method for action recognition. Pattern Anal Appl 19:337–354. https://doi.org/10.1007/s10044-014-0404-8 Wang S, Ma Z, Yang Y et al (2014) Semi-supervised multiple feature analysis for action recognition. IEEE Trans Multimed 16:289–298 Dou JL (2014) Robust human action recognition based on spatio-temporal descriptors and motion temporal templates. Optik (Stuttg) 125:1891–1896 Yu J, Jeon M, Pedrycz W (2014) Weighted feature trajectories and concatenated bag-of-features for action recognition. Neurocomputing 131:200–207 Zhou W, Wang C, Xiao B, Zhang Z (2014) Action recognition via structured codebook construction. Signal Process Image Commun 29:546–555 Gorelick L, Blank M, Shechtman E et al (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29:2247–2253 Rodriguez MD, Ahmed J, Shah M (2008) Action MACH: a spatio-temporal maximum average correlation height filter for action recognition. In: In Proceedings of IEEE international conference on computer vision and pattern recognition Sheng B, Yang W, Sun C (2015) Action recognition using direction-dependent feature pairs and non-negative low rank sparse model. Neurocomputing 158:73–80 Arunnehru J, Chamundeeswari G, Bharathi SP (2018) Human action recognition using 3D convolutional neural networks with 3D motion cuboids in surveillance videos. Procedia Comput Sci 133:471–477. https://doi.org/10.1016/j.procs.2018.07.059 Sargano AB, Wang X, Angelov P, Habib Z (2017) Human action recognition using transfer learning with deep representations. In: 2017 International joint conference on neural networks (IJCNN). pp 463–469 Li N, Huang J, Li T et al (2018) Detecting action tubes via spatial action estimation and temporal path inference. Neurocomputing 311:65–77. https://doi.org/10.1016/j.neucom.2018.05.033 Dilmen E, Beyhan S (2018) An enhanced online LS-SVM approach for classification problems. Soft Comput 22:4457–4475. https://doi.org/10.1007/s00500-017-2713-5