Improving dynamic gesture recognition in untrimmed videos by an online lightweight framework and a new gesture dataset ZJUGesture

Neurocomputing - Tập 523 - Trang 58-68 - 2023
Chao Xu1, Xia Wu1, Mengmeng Wang1, Feng Qiu1, Yong Liu1, Jun Ren2
1State Key Laboratory of Industrial Control Technology and Institute of Cyber-systems and Control, Zhejiang University, China
2Beijing Institute of Mechanical and Electrical Engineering, China

Tài liệu tham khảo

Pandit, 2009, A simple wearable hand gesture recognition device using imems, 592 Abhishek, 2016, Glove-based hand gesture recognition sign language translator using capacitive touch sensor, 334 P. Kumar, J. Verma, S. Prasad, Hand data glove: a wearable real-time device for human-computer interaction, International Journal of Advanced Science and Technology 43. H. Kenn, F. Van Megen, R. Sugar, A glove-based gesture interface for wearable computing applications, in: 4th International Forum on Applied Wearable Computing 2007, VDE, 2007, pp. 1–10. Kalgaonkar, 2009, One-handed gesture recognition using ultrasonic doppler sonar, 1889 Li, 2012, Hand gesture recognition using kinect, 196 P. Molchanov, S. Gupta, K. Kim, J. Kautz, Hand gesture recognition with 3d convolutional neural networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2015, pp. 1–7. Neverova, 2014, Multi-scale deep learning for gesture detection and localization, 474 Molchanov, 2015, Multi-sensor system for driver’s hand-gesture recognition, Vol. 1, 1 Ohn-Bar, 2014, Hand gesture recognition in real time for automotive interfaces: A multimodal vision-based approach and evaluations, IEEE transactions on intelligent transportation systems, 15, 2368, 10.1109/TITS.2014.2337331 Tang, 2019, Fast and robust dynamic hand gesture recognition via key frames extraction and feature fusion, Neurocomputing, 331, 424, 10.1016/j.neucom.2018.11.038 Hu, 2018, 3d separable convolutional neural network for dynamic hand gesture recognition, Neurocomputing, 318, 151, 10.1016/j.neucom.2018.08.042 Cao, 2019, Real-time gesture recognition based on feature recalibration network with multi-scale information, Neurocomputing, 347, 119, 10.1016/j.neucom.2019.03.019 Kuehne, 2011, Hmdb: a large video database for human motion recognition, 2556 K. Soomro, A.R. Zamir, M. Shah, Ucf101: A dataset of 101 human actions classes from videos in the wild, arXiv preprint arXiv:1212.0402. Zhang, 2018, Egogesture: a new dataset and benchmark for egocentric hand gesture recognition, IEEE Transactions on Multimedia, 20, 1038, 10.1109/TMM.2018.2808769 P. Molchanov, X. Yang, S. Gupta, K. Kim, S. Tyree, J. Kautz, Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4207–4215. J. Materzynska, G. Berger, I. Bax, R. Memisevic, The jester dataset: A large-scale video dataset of human gestures, in: Proceedings of the IEEE International Conference on Computer Vision Workshops, 2019, pp. 0–0. Wren, 1997, Pfinder: Real-time tracking of the human body, IEEE Transactions on pattern analysis and machine intelligence, 19, 780, 10.1109/34.598236 D. Fleet, Y. Weiss, Optical flow estimation, in: Handbook of mathematical models in computer vision, Springer, 2006, pp. 237–257. M. Danelljan, G. Bhat, F. Shahbaz Khan, M. Felsberg, Eco: Efficient convolution operators for tracking, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 6638–6646. Danelljan, 2016, Beyond correlation filters: Learning continuous convolution operators for visual tracking, 472 Kaufmann, 2010, Hand posture recognition using real-time artificial evolution, 251 Weng, 2010, Robust hand posture recognition integrating multi-cue hand tracking, 497 Flasiński, 2010, On the use of graph parsing for recognition of isolated hand postures of polish sign language, Pattern Recognition, 43, 2249, 10.1016/j.patcog.2010.01.004 J. Carreira, A. Zisserman, Quo vadis, action recognition? a new model and the kinetics dataset, in: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6299–6308. P. Narayana, R. Beveridge, B.A. Draper, Gesture recognition: Focus on the hands, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 5235–5244. K. Simonyan, A. Zisserman, Two-stream convolutional networks for action recognition in videos, in: Advances in neural information processing systems, 2014, pp. 568–576. B. Zhang, L. Wang, Z. Wang, Y. Qiao, H. Wang, Real-time action recognition with enhanced motion vector cnns, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2718–2726. J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, T. Darrell, Long-term recurrent convolutional networks for visual recognition and description, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 2625–2634. Hochreiter, 1997, Long short-term memory, Neural computation, 9, 1735, 10.1162/neco.1997.9.8.1735 D. Tran, L. Bourdev, R. Fergus, L. Torresani, M. Paluri, Learning spatiotemporal features with 3d convolutional networks, in: Proceedings of the IEEE international conference on computer vision, 2015, pp. 4489–4497. Z. Qiu, T. Yao, T. Mei, Learning spatio-temporal representation with pseudo-3d residual networks, in: proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 5533–5541. Köpüklü, 2019, Real-time hand gesture detection and classification using convolutional neural networks, 1 Q. Miao, Y. Li, W. Ouyang, Z. Ma, X. Xu, W. Shi, X. Cao, Multimodal gesture recognition based on the resc3d network, in: Proceedings of the IEEE International Conference on Computer Vision Workshops, 2017, pp. 3047–3055. D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun, M. Paluri, A closer look at spatiotemporal convolutions for action recognition, in: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018, pp. 6450–6459. Xu, 2021, Cross-modality online distillation for multi-view action recognition, Neurocomputing, 456, 384, 10.1016/j.neucom.2021.05.077 C. Yang, Y. Xu, J. Shi, B. Dai, B. Zhou, Temporal pyramid network for action recognition, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 591–600. Wang, 2016, Temporal segment networks: Towards good practices for deep action recognition, 20 Zhang, 2020, Gesture recognition based on deep deformable 3d convolutional neural networks, Pattern Recognition, 107, 10.1016/j.patcog.2020.107416 G. Sung, K. Sokal, E. Uboweja, V. Bazarevsky, J. Baccash, E.G. Bazavan, C.-L. Chang, M. Grundmann, On-device real-time hand gesture recognition, arXiv preprint arXiv:2111.00038. Benitez-Garcia, 2021, Ipn hand: A video dataset and benchmark for real-time continuous hand gesture recognition, 4340 J. Wan, Y. Zhao, S. Zhou, I. Guyon, S. Escalera, S.Z. Li, Chalearn looking at people rgb-d isolated and continuous datasets for gesture recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2016, pp. 56–64. K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9. S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, arXiv preprint arXiv:1502.03167. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826. C. Szegedy, S. Ioffe, V. Vanhoucke, A.A. Alemi, Inception-v4, inception-resnet and the impact of residual connections on learning, in: Thirty-first AAAI conference on artificial intelligence, 2017. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen, Mobilenetv 2: Inverted residuals and linear bottlenecks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510–4520. B. Zhou, A. Andonian, A. Oliva, A. Torralba, Temporal relational reasoning in videos, in: Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 803–818. X. Zhang, X. Zhou, M. Lin, J. Sun, Shufflenet: An extremely efficient convolutional neural network for mobile devices. J. Lin, C. Gan, S. Han, Tsm: Temporal shift module for efficient video understanding, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 7083–7093.