Multiple feature fusion in convolutional neural networks for action recognition

EDP Sciences - Tập 22 - Trang 73-78 - 2017

Hongyang Li¹, Jun Chen^1,2, Ruimin Hu^1,2

¹National Engineering Research Center for Multimedia Software, Wuhan University, Hubei, China

²State Key Laboratory of Software Engineering, Wuhan University, Hubei, China

Tóm tắt

Action recognition is important for understanding the human behaviors in the video, and the video representation is the basis for action recognition. This paper provides a new video representation based on convolution neural networks (CNN). For capturing human motion information in one CNN, we take both the optical flow maps and gray images as input, and combine multiple convolutional features by max pooling across frames. In another CNN, we input single color frame to capture context information. Finally, we take the top full connected layer vectors as video representation and train the classifiers by linear support vector machine. The experimental results show that the representation which integrates the optical flow maps and gray images obtains more discriminative properties than those which depend on only one element. On the most challenging data sets HMDB51 and UCF101, this video representation obtains competitive performance.

Tài liệu tham khảo

Sadanand S, Corso J J. Action bank: A high-level representation of activity in video [C] // 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D C: IEEE Press, 2012: 1234–1241. Wang H, Schmid C. Action recognition with improved trajectories [C] // Proceedings of the IEEE International Conference on Computer Vision. Washington D C: IEEE Press, 2013: 3551–3558. Aggarwal J K, Ryoo M S. Human activity analysis: A review[J]. ACM Computing Surveys (CSUR), 2011, 43(3): 16. Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks [C] // Advances in Neural Information Processing Systems. Washington D C: IEEE Press, 2012: 1097–1105. Farabet C, Couprie C, Najman L, et al. Learning hierarchical features for scene labeling [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8): 1915–1929. Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural networks [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D C: IEEE Press, 2014: 1725–1732. Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos [C] // Advances in Neural Information Processing Systems. Washington D C: IEEE Press, 2014: 568–576. Wang H, Ullah M M, Klaser A, et al. Evaluation of local spatio-temporal features for action recognition [C] // BMVC 2009-British Machine Vision Conference. London: BMVA Press, 2009: 124.1–124.11. Laptev I. On space-time interest points [J]. International Journal of Computer Vision, 2005, 64(2-3): 107–123. Willems G, Tuytelaars T, Van Gool L. An efficient dense and scale-invariant spatio-temporal interest point detector [C] // European Conference on Computer Vision. Berlin, Heidelberg: Springer-Verlag, 2008: 650–663. Dalal N, Triggs B, Schmid C. Human detection using oriented histograms of flow and appearance [C] // European Conference on Computer Vision. Berlin, Heidelberg: Springer-Verlag, 2006: 428–441. Laptev I, Marszalek M, Schmid C, et al. Learning realistic human actions from movies [C] // IEEE Conference on Computer Vision and Pattern Recognition, CVPR. Piscataway: IEEE Press, 2008: 1–8. Ji S, Xu W, Yang M, et al. 3D convolutional neural networks for human action recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1): 221–231. Chéron G, Laptev I, Schmid C. P-CNN: Pose-based CNN features for action recognition [C] // Proceedings of the IEEE International Conference on Computer Vision. Washington D C: IEEE Press, 2015: 3218–3226. Yue-Hei Ng J, Hausknecht M, Vijayanarasimhan S, et al. Beyond short snippets: Deep networks for video classification [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D C: IEEE Press, 2015: 4694–4702. Yan Z, Zhang H, Piramuthu R, et al. HD-CNN: hierarchical deep convolutional neural networks for large scale visual recognition [C] // Proceedings of the IEEE International Conference on Computer Vision. Washington D C: IEEE Press, 2015: 2740–2748. Wang L, Qiao Y, Tang X. Action recognition with trajectory-pooled deep-convolutional descriptors [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D C: IEEE Press, 2015: 4305–4314. Soomro K, Zamir A R, Shah M. UCF101: A dataset of 101 human actions classes from videos in the wild[EB/OL]. [2016-04-05]. http://aarxiv-okg/abs/1212.0402. Kuehne H, Jhuang H, Garrote H A, et al. A large video database for human motion recognition [C] // Proc of IEEE International Conference on Computer Vision. Washington D C: IEEE Press, 2011: 2556–2563. Jiang Y, Liu J, Zamir R, et al. THUMOS challenge: Action recognition with a large number of classes [C] // ICCV Workshop on Action Recognition with a Large Number of Classes. Piscataway: IEEE Press, 2013: 1–3. Jia Y, Shelhamer E, Donahue J, et al. Caffe: Convolutional architecture for fast feature embedding [C] // Proceedings of the 22nd ACM International Conference on Multimedia. New York:ACM Press, 2014: 675–678. Wang H, Schmid C. LEAR-INRIA submission for the THUMOS workshop [C] // ICCV Workshop on Action Recognition with a Large Number of Classes. Washington D C: IEEE Press, 2013, 2(7): 8–11. Peng X, Wang L, Wang X, et al. Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice [J]. Computer Vision and Image Understanding, 2016, 150: 109–125. Peng X, Zou C, Qiao Y, et al. Action recognition with stacked fisher vectors [C] // European Conference on Computer Vision. Berlin: Springer-Verlag, 2014: 581–595.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA