Video spatiotemporal mapping for human action recognition by convolutional neural network
Tóm tắt
Từ khóa
Tài liệu tham khảo
Wang X (2013) Intelligent multi-camera video surveillance: a review. Pattern Recognit Lett 34:3–19. https://doi.org/10.1016/j.patrec.2012.07.005
Liu C, Hu C, Liu Q, Aggarwal JK (2013) Video event description in scene context. Neurocomputing. 119:82–93. https://doi.org/10.1016/j.neucom.2012.03.037
Hu W, Xie N, Li L, Zeng X, Maybank S (2011) A survey on visual content-based video indexing and retrieval. IEEE Trans Syst Man Cybern 41:797–816
Zhu F, Shao L, Xie J, Fang Y (2016) From handcrafted to learned representations for human action recognition: a survey. Image Vis Comput 55(2):42–52. https://doi.org/10.1016/j.imavis.2016.06.007
Enzweiler M, Gavrila DM (2009) Monocular pedestrian detection: survey and experiments. IEEE Trans Pattern Anal Mach Intell 31:2179–2195
Barr P, Noble J, Biddle R (2007) Video game values: human–computer interaction and games. Interact Comput 19:180–195. https://doi.org/10.1016/j.intcom.2006.08.008
Gowsikhaa D, Abirami S, Baskaran R (2014) Automated human behavior analysis from surveillance videos: a survey. Artif Intell Rev 42:747–765
Afsar P, Cortez P, Santos H (2015) Automatic visual detection of human behavior: a review from 2000 to 2014. Expert Syst Appl 42:6935–6956. https://doi.org/10.1016/j.eswa.2015.05.023
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1–9
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
Wan L, Zeiler M, Zhang S, LeCun Y, Fergus R (2013) Regularization of neural networks using dropconnect. In: International conference on machine learning, ICML, pp 109–111
Wang X, Zhang L, Lin L, Liang Z, Zuo W (2014) Deep joint task learning for generic object extraction. In: Advances in neural information processing systems, pp 523–531
Guo Y, Liu Y, Oerlemans A, Lao S, Wu S, Lew MS (2015) Deep learning for visual understanding: a review. Neurocomputing. https://doi.org/10.1016/j.neucom.2015.09.116
Liu Y, Guo Y, Wu S, Lew MS (2015) Deepindex for accurate and efficient image retrieval. In: Proceedings of the 5th ACM on international conference on multimedia retrieval, pp 43–50
Wan J, Wang D, Hoi SCH, Wu P, Zhu J, Zhang Y, Li J (2014) Deep learning for content-based image retrieval: a comprehensive study. In: Proceedings of the 22nd ACM international conference on multimedia, pp 157–166
Ordóñez FJ, Roggen D (2016) Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors (Switzerland). https://doi.org/10.3390/s16010115
Ng JY, Hausknecht MJ, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 4694–4702
Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, Darrell T, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2625–2634
Ji S, Yang M, Yu K, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35:221–231. https://doi.org/10.1109/TPAMI.2012.59
Diba A, Pazandeh AM, Van Gool L (2016) Efficient two-stream motion and appearance 3D CNNs for video classification. In: ECCV’16
Sun L, Jia K, Yeung D-Y, Shi BE (2015) Human action recognition using factorized spatio-temporal convolutional networks. In: 2015 IEEE international conference on computer vision (ICCV). IEEE, pp 4597–4605
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2016) Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497
Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2011) Sequential deep learning for human action recognition. In: Salah AA, Lepri B (eds) Human behavior understanding. Springer, Berlin, pp 29–39
Natarajan P, Singh VK, Nevatia R (2010) Learning 3D action models from a few 2D videos for view invariant action recognition. In: Computer vision and pattern recognition (CVPR). IEEE, pp 2006–2013
Luvizon DC, Tabia H, Picard D (2017) Learning features combination for human action recognition from skeleton sequences. Pattern Recognit Lett. https://doi.org/10.1016/j.patrec.2017.02.001
Jagadeesh B, Patil CM (2016) Video based action detection and recognition human using optical flow and SVM classifier. In: 2016 IEEE international conference on recent trends in electronics, information communication technology (RTEICT), pp 1761–1765
Mocanu DC, Bou Ammar H, Lowet D, Driessens K, Liotta A, Weiss G, Tuyls K (2015) Factored four way conditional restricted Boltzmann machines for activity recognition. Pattern Recognit Lett 66:100–108. https://doi.org/10.1016/j.patrec.2015.01.013
Chaudhry R, Ravichandran A, Hager G, Vidal R (2009) Histograms of oriented optical flow and Binet–Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1932–1939
Lertniphonphan K, Aramvith S, Chalidabhongse TH (2011) Human action recognition using direction histograms of optical flow. In: 11th International symposium on communications and information technologies (ISCIT), pp 574–579
Chun S, Lee CS (2016) Human action recognition using histogram of motion intensity and direction from multiple views. IET Comput Vis 10:250–256. https://doi.org/10.1049/iet-cvi.2015.0233
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Computer vision and pattern recognition (CVPR), pp 886–893
Bay H, Ess A, Tuytelaars T, Gool L Van (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110:346–359. https://doi.org/10.1016/j.cviu.2007.09.014
Wang H, Klaser A, Schmid C, Liu C-L (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103:60–79
Wang H, Ullah MM, Klaser A, Laptev I, Schmid C, Klaser A, Laptev I, Schmid C (2009) Evaluation of local spatio-temporal features for action recognition. In: BMVC 2009-British machine vision conference, pp 121–124
Ballan L, Bertini M, Bimbo A Del, Seidenari L, Serra G (2012) Effective codebooks for human action representation and classification in unconstrained videos. IEEE Trans Multimed 14:1234–1245. https://doi.org/10.1109/TMM.2012.2191268
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE international conference on computer vision and pattern recognition, pp 1–8
Coniglio C, Meurie C, Lézoray O, Berbineau M (2017) People silhouette extraction from people detection bounding boxes in images. Pattern Recognit Lett 93:182–191. https://doi.org/10.1016/j.patrec.2016.12.014
Zeng W, Wang C, Yang F (2014) Silhouette-based gait recognition via deterministic learning. Pattern Recognit 47:3568–3584. https://doi.org/10.1016/j.patcog.2014.04.014
Willems G, Tuytelaars T, Gool L-V (2008) An efficient dense and scale-invariant spatio-temporal interest point detector. In: 10th European conference on computer vision. Springer, Berlin, pp 650–663
Kim H-J, Lee JS, Yang H-S (2007) Human action recognition using a modified convolutional neural network. In: Liu D, Fei S, Hou Z, Zhang H, Sun C (eds) Proceedings of the 4th international symposium on neural networks: part II—advances in neural networks. Springer, Berlin, pp 715–723
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Proceedings of the 27th international conference on neural information processing systems. MIT Press, Cambridge, pp 568–576
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the 2014 IEEE conference on computer vision and pattern recognition. IEEE Computer Society, Washington, DC, pp 1725–1732
Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 4305–4314
Wang P, Cao Y, Shen C, Liu L, Shen HT (2016) Temporal pyramid pooling based convolutional neural network for action recognition. IEEE Trans Circuits Syst Video Technol PP:1. https://doi.org/10.1109/tcsvt.2016.2576761
Abrishami Moghaddam H, Taghizadeh Khajoie T, Rouhi AH, Saadatmand-Tarzjan M (2005) Wavelet correlogram: a new approach for image indexing and retrieval. Pattern Recognit 38:2506–2518. https://doi.org/10.1016/j.patcog.2005.05.010
Laptev I (2005) On space-time interest points. Int J Comput Vis 64:107–123. https://doi.org/10.1007/s11263-005-1838-7
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. Int Conf Learn Represent. https://doi.org/10.1016/j.infsof.2008.09.005
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML’15, JMLR.org, pp 448–456
Yu D, Eversole A, Seltzer MLM, Yao K, Kuchaiev O, Zhang Y, Seide F, Huang Z, Guenter B, Wang H, Droppo J, Zweig G, Rossbach C, Gao J, Stolcke A, Currey J, Slaney M, Chen G, Agarwal A, Basoglu C, Padmilac M, Kamenev A, Ivanov V, Cypher S, Parthasarathi H, Mitra B, Peng B, Huang X, Akchurin E, Basoglu C, Chen G, Cyphers S, Droppo J, Eversole A, Guenter B, Hillebrand M, Huang X, Huang Z, Ivanov V, Kamenev A, Kranen P, Kuchaiev O, Manousek W, Orlov A, Padmilac M, Parthasarathi H, Peng B, Reznichenko A, Seide F, Seltzer MLM, Slaney M, Stolcke A, Wang H, Yao K, Yu D (2014) An introduction to computational networks and the computational network toolkit. Microsoft Research, Redmond
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proceedings of the IEEE international conference on pattern recognition, pp 32–36
Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2012) Spatio-temporal convolutional sparse auto-encoder for sequence classification. In: BMVC, pp 1–12
Taylor GW, Fergus R, LeCun Y, Bregler C (2010) Convolutional learning of spatio-temporal features. In: Daniilidis K, Maragos P, Paragios N (eds) Computer vision—ECCV 2010. Springer, Berlin, pp 140–153
Shi Y, Zeng W, Huang T, Wang Y (2015) Learning deep trajectory descriptor for action recognition in videos using deep neural networks. In: 2015 IEEE international conference on multimedia and expo (ICME), pp 1–6
Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space–time shapes. IEEE Trans Pattern Anal Mach Intell 29:2247–2253
Nasiri JA, Moghadam Charkari N, Mozafari K (2014) Energy-based model of least squares twin support vector machines for human action recognition. Sig Process 104:248–257
Sheng B, Yang W, Sun C (2015) Action recognition using direction-dependent feature pairs and non-negative low rank sparse model. Neurocomputing 158:73–80
Dou J, Liu J (2014) Robust human action recognition based on spatio-temporal descriptors and motion temporal templates. Optik (Stuttg) 125:1891–1896
Klaser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3D-gradients. In: Proceedings of the British machine vision conference
Wang L, Li R, Fang Y (2016) Gradient-layer feature transform for action detection and recognition. J Vis Commun Image Represent 40:159–167. https://doi.org/10.1016/j.jvcir.2016.06.023
Al-Azzo F, Bao C, Taqi AM, Milanova M, Ghassan N (2017) Human actions recognition based on 3D deep neural network. In: 2017 Annual conference on new trends in information and communications technology applications (NTICT), pp 240–246
Soomro K, Zamir AR (2014) Action recognition in realistic sports videos. In: Moeslund TB, Thomas G, Hilton A (eds) Computer vision in sports. Springer, Cham, pp 181–208
Lan T, Wang Y, Mori G (2011) Discriminative figure-centric models for joint action localization and recognition. In: 2011 International conference on computer vision, pp 2003–2010
Raptis M, Kokkinos I, Soatto S (2012) Discovering discriminative action parts from mid-level video representations. In: 2012 IEEE conference on computer vision and pattern recognition, pp 1242–1249
Ma S, Zhang J, Ikizler-Cinbis N, Sclaroff S (2013) Action recognition and localization by hierarchical space–time segments. In: 2013 IEEE international conference on computer vision, pp 2744–2751