Video spatiotemporal mapping for human action recognition by convolutional neural network

Amin Zare, Hamid Abrishami Moghaddam1, Arash Sharifi2
1Faculty of Electrical and Computer Engineering, K. N. Toosi University of Technology, Tehran, Iran
2Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran

Tóm tắt

Từ khóa


Tài liệu tham khảo

Wang X (2013) Intelligent multi-camera video surveillance: a review. Pattern Recognit Lett 34:3–19. https://doi.org/10.1016/j.patrec.2012.07.005

Liu C, Hu C, Liu Q, Aggarwal JK (2013) Video event description in scene context. Neurocomputing. 119:82–93. https://doi.org/10.1016/j.neucom.2012.03.037

Hu W, Xie N, Li L, Zeng X, Maybank S (2011) A survey on visual content-based video indexing and retrieval. IEEE Trans Syst Man Cybern 41:797–816

Zhu F, Shao L, Xie J, Fang Y (2016) From handcrafted to learned representations for human action recognition: a survey. Image Vis Comput 55(2):42–52. https://doi.org/10.1016/j.imavis.2016.06.007

Enzweiler M, Gavrila DM (2009) Monocular pedestrian detection: survey and experiments. IEEE Trans Pattern Anal Mach Intell 31:2179–2195

Barr P, Noble J, Biddle R (2007) Video game values: human–computer interaction and games. Interact Comput 19:180–195. https://doi.org/10.1016/j.intcom.2006.08.008

Gowsikhaa D, Abirami S, Baskaran R (2014) Automated human behavior analysis from surveillance videos: a survey. Artif Intell Rev 42:747–765

Afsar P, Cortez P, Santos H (2015) Automatic visual detection of human behavior: a review from 2000 to 2014. Expert Syst Appl 42:6935–6956. https://doi.org/10.1016/j.eswa.2015.05.023

Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1–9

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778

Wan L, Zeiler M, Zhang S, LeCun Y, Fergus R (2013) Regularization of neural networks using dropconnect. In: International conference on machine learning, ICML, pp 109–111

Wang X, Zhang L, Lin L, Liang Z, Zuo W (2014) Deep joint task learning for generic object extraction. In: Advances in neural information processing systems, pp 523–531

Guo Y, Liu Y, Oerlemans A, Lao S, Wu S, Lew MS (2015) Deep learning for visual understanding: a review. Neurocomputing. https://doi.org/10.1016/j.neucom.2015.09.116

Liu Y, Guo Y, Wu S, Lew MS (2015) Deepindex for accurate and efficient image retrieval. In: Proceedings of the 5th ACM on international conference on multimedia retrieval, pp 43–50

Wan J, Wang D, Hoi SCH, Wu P, Zhu J, Zhang Y, Li J (2014) Deep learning for content-based image retrieval: a comprehensive study. In: Proceedings of the 22nd ACM international conference on multimedia, pp 157–166

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780

Ordóñez FJ, Roggen D (2016) Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors (Switzerland). https://doi.org/10.3390/s16010115

Ng JY, Hausknecht MJ, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 4694–4702

Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, Darrell T, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2625–2634

Ji S, Yang M, Yu K, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35:221–231. https://doi.org/10.1109/TPAMI.2012.59

Diba A, Pazandeh AM, Van Gool L (2016) Efficient two-stream motion and appearance 3D CNNs for video classification. In: ECCV’16

Sun L, Jia K, Yeung D-Y, Shi BE (2015) Human action recognition using factorized spatio-temporal convolutional networks. In: 2015 IEEE international conference on computer vision (ICCV). IEEE, pp 4597–4605

Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2016) Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497

Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2011) Sequential deep learning for human action recognition. In: Salah AA, Lepri B (eds) Human behavior understanding. Springer, Berlin, pp 29–39

Natarajan P, Singh VK, Nevatia R (2010) Learning 3D action models from a few 2D videos for view invariant action recognition. In: Computer vision and pattern recognition (CVPR). IEEE, pp 2006–2013

Luvizon DC, Tabia H, Picard D (2017) Learning features combination for human action recognition from skeleton sequences. Pattern Recognit Lett. https://doi.org/10.1016/j.patrec.2017.02.001

Jagadeesh B, Patil CM (2016) Video based action detection and recognition human using optical flow and SVM classifier. In: 2016 IEEE international conference on recent trends in electronics, information communication technology (RTEICT), pp 1761–1765

Mocanu DC, Bou Ammar H, Lowet D, Driessens K, Liotta A, Weiss G, Tuyls K (2015) Factored four way conditional restricted Boltzmann machines for activity recognition. Pattern Recognit Lett 66:100–108. https://doi.org/10.1016/j.patrec.2015.01.013

Chaudhry R, Ravichandran A, Hager G, Vidal R (2009) Histograms of oriented optical flow and Binet–Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1932–1939

Lertniphonphan K, Aramvith S, Chalidabhongse TH (2011) Human action recognition using direction histograms of optical flow. In: 11th International symposium on communications and information technologies (ISCIT), pp 574–579

Chun S, Lee CS (2016) Human action recognition using histogram of motion intensity and direction from multiple views. IET Comput Vis 10:250–256. https://doi.org/10.1049/iet-cvi.2015.0233

Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Computer vision and pattern recognition (CVPR), pp 886–893

Bay H, Ess A, Tuytelaars T, Gool L Van (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110:346–359. https://doi.org/10.1016/j.cviu.2007.09.014

Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Comput Vis 60:91–110

Wang H, Klaser A, Schmid C, Liu C-L (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103:60–79

Wang H, Ullah MM, Klaser A, Laptev I, Schmid C, Klaser A, Laptev I, Schmid C (2009) Evaluation of local spatio-temporal features for action recognition. In: BMVC 2009-British machine vision conference, pp 121–124

Ballan L, Bertini M, Bimbo A Del, Seidenari L, Serra G (2012) Effective codebooks for human action representation and classification in unconstrained videos. IEEE Trans Multimed 14:1234–1245. https://doi.org/10.1109/TMM.2012.2191268

Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE international conference on computer vision and pattern recognition, pp 1–8

Coniglio C, Meurie C, Lézoray O, Berbineau M (2017) People silhouette extraction from people detection bounding boxes in images. Pattern Recognit Lett 93:182–191. https://doi.org/10.1016/j.patrec.2016.12.014

Zeng W, Wang C, Yang F (2014) Silhouette-based gait recognition via deterministic learning. Pattern Recognit 47:3568–3584. https://doi.org/10.1016/j.patcog.2014.04.014

Willems G, Tuytelaars T, Gool L-V (2008) An efficient dense and scale-invariant spatio-temporal interest point detector. In: 10th European conference on computer vision. Springer, Berlin, pp 650–663

Kim H-J, Lee JS, Yang H-S (2007) Human action recognition using a modified convolutional neural network. In: Liu D, Fei S, Hou Z, Zhang H, Sun C (eds) Proceedings of the 4th international symposium on neural networks: part II—advances in neural networks. Springer, Berlin, pp 715–723

Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Proceedings of the 27th international conference on neural information processing systems. MIT Press, Cambridge, pp 568–576

Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the 2014 IEEE conference on computer vision and pattern recognition. IEEE Computer Society, Washington, DC, pp 1725–1732

Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 4305–4314

Wang P, Cao Y, Shen C, Liu L, Shen HT (2016) Temporal pyramid pooling based convolutional neural network for action recognition. IEEE Trans Circuits Syst Video Technol PP:1. https://doi.org/10.1109/tcsvt.2016.2576761

Abrishami Moghaddam H, Taghizadeh Khajoie T, Rouhi AH, Saadatmand-Tarzjan M (2005) Wavelet correlogram: a new approach for image indexing and retrieval. Pattern Recognit 38:2506–2518. https://doi.org/10.1016/j.patcog.2005.05.010

Laptev I (2005) On space-time interest points. Int J Comput Vis 64:107–123. https://doi.org/10.1007/s11263-005-1838-7

Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. Int Conf Learn Represent. https://doi.org/10.1016/j.infsof.2008.09.005

Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML’15, JMLR.org, pp 448–456

Yu D, Eversole A, Seltzer MLM, Yao K, Kuchaiev O, Zhang Y, Seide F, Huang Z, Guenter B, Wang H, Droppo J, Zweig G, Rossbach C, Gao J, Stolcke A, Currey J, Slaney M, Chen G, Agarwal A, Basoglu C, Padmilac M, Kamenev A, Ivanov V, Cypher S, Parthasarathi H, Mitra B, Peng B, Huang X, Akchurin E, Basoglu C, Chen G, Cyphers S, Droppo J, Eversole A, Guenter B, Hillebrand M, Huang X, Huang Z, Ivanov V, Kamenev A, Kranen P, Kuchaiev O, Manousek W, Orlov A, Padmilac M, Parthasarathi H, Peng B, Reznichenko A, Seide F, Seltzer MLM, Slaney M, Stolcke A, Wang H, Yao K, Yu D (2014) An introduction to computational networks and the computational network toolkit. Microsoft Research, Redmond

Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proceedings of the IEEE international conference on pattern recognition, pp 32–36

Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2012) Spatio-temporal convolutional sparse auto-encoder for sequence classification. In: BMVC, pp 1–12

Taylor GW, Fergus R, LeCun Y, Bregler C (2010) Convolutional learning of spatio-temporal features. In: Daniilidis K, Maragos P, Paragios N (eds) Computer vision—ECCV 2010. Springer, Berlin, pp 140–153

Shi Y, Zeng W, Huang T, Wang Y (2015) Learning deep trajectory descriptor for action recognition in videos using deep neural networks. In: 2015 IEEE international conference on multimedia and expo (ICME), pp 1–6

Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space–time shapes. IEEE Trans Pattern Anal Mach Intell 29:2247–2253

Nasiri JA, Moghadam Charkari N, Mozafari K (2014) Energy-based model of least squares twin support vector machines for human action recognition. Sig Process 104:248–257

Sheng B, Yang W, Sun C (2015) Action recognition using direction-dependent feature pairs and non-negative low rank sparse model. Neurocomputing 158:73–80

Dou J, Liu J (2014) Robust human action recognition based on spatio-temporal descriptors and motion temporal templates. Optik (Stuttg) 125:1891–1896

Klaser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3D-gradients. In: Proceedings of the British machine vision conference

Wang L, Li R, Fang Y (2016) Gradient-layer feature transform for action detection and recognition. J Vis Commun Image Represent 40:159–167. https://doi.org/10.1016/j.jvcir.2016.06.023

Al-Azzo F, Bao C, Taqi AM, Milanova M, Ghassan N (2017) Human actions recognition based on 3D deep neural network. In: 2017 Annual conference on new trends in information and communications technology applications (NTICT), pp 240–246

Soomro K, Zamir AR (2014) Action recognition in realistic sports videos. In: Moeslund TB, Thomas G, Hilton A (eds) Computer vision in sports. Springer, Cham, pp 181–208

Lan T, Wang Y, Mori G (2011) Discriminative figure-centric models for joint action localization and recognition. In: 2011 International conference on computer vision, pp 2003–2010

Raptis M, Kokkinos I, Soatto S (2012) Discovering discriminative action parts from mid-level video representations. In: 2012 IEEE conference on computer vision and pattern recognition, pp 1242–1249

Ma S, Zhang J, Ikizler-Cinbis N, Sclaroff S (2013) Action recognition and localization by hierarchical space–time segments. In: 2013 IEEE international conference on computer vision, pp 2744–2751

Yu J, Jeon M, Pedrycz W (2014) Weighted feature trajectories and concatenated bag-of-features for action recognition. Neurocomputing 131:200–207