LM-Net: Mạng nhận diện cử chỉ động với tập hợp lâu dài và kích thích chuyển động

Shaopeng Chang1, Xueyu Huang1,2,3
1School of Software Engineering, Jiangxi University of Science and Technology, Nanchang, China
2Nanchang Key Laboratory of Virtual Digital Factory and Cultural Communications, Jiangxi University of Science and Technology, Nanchang, China
3Ganzhou 5G Industry Development Institute, Ganzhou, China

Tóm tắt

Trong những năm gần đây, đã có sự quan tâm ngày càng tăng về các cử chỉ tay động như một phương thức tự nhiên trong tương tác giữa người và máy. Tuy nhiên, các phương pháp hiện có cho việc nhận diện cử chỉ động vẫn có một số hạn chế, đặc biệt là trong việc liên tục nắm bắt và tập trung vào khu vực chuyển động của tay qua các mô hình chuyển động khác nhau. Bài báo nghiên cứu này giới thiệu LMNet, một mạng lưới sáng tạo và hiệu quả bao gồm Mô-đun Tập hợp Dài hạn và Mô-đun Kích thích Chuyển động. Mô-đun Kích thích Chuyển động khai thác thông tin chuyển động được trích xuất từ các khung kế cận nhằm tăng cường các kênh nhạy cảm với chuyển động, trong khi Mô-đun Tập hợp Dài hạn sử dụng sự tích chập động để tiếp thu thông tin tạm thời từ các mô hình chuyển động đa dạng. Các thí nghiệm nghiêm ngặt được thực hiện trên các tập dữ liệu EgoGesture và Jester cho thấy LMNet vượt trội hơn hầu hết các phương pháp hiện hành về độ chính xác, đồng thời duy trì một chi phí tính toán tối ưu.

Từ khóa

#cử chỉ động #nhận diện cử chỉ #tương tác người-máy #LMNet #Mô-đun Tập hợp Dài hạn #Mô-đun Kích thích Chuyển động

Tài liệu tham khảo

Tu Z, Huang Z, Chen Y, Kang D, Bao L, Yang B, Yuan J (2023) Consistent 3d hand reconstruction in video via self-supervised learning. IEEE Trans Pattern Anal Mach Intell 45(8):9469–9485 Wei Z, Zeyi L, Jian C, Mingyu K, Xiaoming D, Hongan W (2021) Survey of dynamic hand gesture understanding and interaction. J Softw 32(10):3051–3067 Yuanyuan S, Yunan L, Xiaolong F, Kaibin M, Qiguang M (2021) Review of dynamic gesture recognition. Virtual Real Intell Hardw 3(3):183–206 Parcheta Z, Martínez-Hinarejos C-D (2017) Sign language gesture recognition using HMM. In: Pattern recognition and image analysis: 8th Iberian conference, IbPRIA 2017, Faro, Portugal, June 20–23, 2017, Proceedings, vol 8. Springer, pp 419–426 Carmona J.M, Climent J (2012) A performance evaluation of HMM and DTW for gesture recognition. In: Progress in pattern recognition, image analysis, computer vision, and applications: 17th Iberoamerican congress, CIARP 2012, Buenos Aires, Argentina, September 3–6, 2012. Proceedings 17. Springer, pp 236–243 Moshayedi AJ, Roy AS, Kolahdooz A, Shuxin Y (2022) Deep learning application pros and cons over algorithm. EAI Endorsed Trans AI Robot 1(1):7–7 Uddin NMI, Moshayedi AJ, Shuxin Y et al (2022) The face detection/recognition, perspective and obstacles in robotic: a review. EAI Endorsed Trans AI Robot 1(1):14–14 Xu G, Khan AS, Moshayedi AJ, Zhang X, Shuxin Y (2022) The object detection, perspective and obstacles in robotic: a review. EAI Endorsed Trans AI Robot 1(1):13 Moshayedi AJ, Roy AS, Taravet A, Liao L, Wu J, Gheisari M (2023) A secure traffic police remote sensing approach via a deep learning-based low-altitude vehicle speed detector through uavs in smart cites: algorithm, implementation and evaluation. Future Transp 3(1):189–209 Lai K, Yanushkevich SN (2018) CNN+ RNN depth and skeleton based dynamic hand gesture recognition. In: 2018 24th International conference on pattern recognition (ICPR). IEEE, pp 3451–3456 Basnin N, Nahar L, Hossain MS (2021) An integrated CNN-LSTM model for micro hand gesture recognition. In: Intelligent computing and optimization: proceedings of the 3rd international conference on intelligent computing and optimization 2020 (ICO 2020). Springer, pp 379–392 Tsironi E, Barros P, Weber C, Wermter S (2017) An analysis of convolutional long short-term memory recurrent neural networks for gesture recognition. Neurocomputing 268:76–86 Liu X, Lee J-Y, Jin H (2019) Learning video representations from correspondence proposals. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4273–4281 Li Y, Ji B, Shi X, Zhang J, Kang B, Wang L (2020) TEA: Temporal excitation and aggregation for action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 909–918 Wang L, Tong Z, Ji B, Wu G (2021) TDN: Temporal difference networks for efficient action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1895–1904 Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497 Köpüklü O, Gunduz A, Kose N, Rigoll G (2019) Real-time hand gesture detection and classification using convolutional neural networks. In: 2019 14th IEEE international conference on automatic face & gesture recognition (FG 2019). IEEE, pp 1–8 Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M (2018) A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6450–6459 Liu Z, Ning J, Cao Y, Wei Y, Zhang Z, Lin S, Hu H (2022) Video swin transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3202–3211 Bertasius G, Wang H, Torresani L (2021) Is space-time attention all you need for video understanding? In: ICML, vol 2, p 4 Beauchemin SS, Barron JL (1995) The computation of optical flow. ACM Comput Surv 27(3):433–466 Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, vol 27 Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2016) Temporal segment networks: towards good practices for deep action recognition. In: European conference on computer vision. Springer, pp 20–36 Lin J, Gan C, Han S (2019) TSM: temporal shift module for efficient video understanding. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7083–7093 Wang Z, She Q, Smolic A (2021) Action-net: Multipath excitation for action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13214–13223 Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 32 Tu Z, Zhang J, Li H, Chen Y, Yuan J (2023) Joint-bone fusion graph convolutional network for semi-supervised skeleton action recognition. IEEE Trans Multimed 25:1819–1831. https://doi.org/10.1109/TMM.2022.3168137 Qiu J, Du L, Zhang D, Su S, Tian Z (2019) NEI-TTE: intelligent traffic time estimation based on fine-grained time derivation of road segments for smart city. IEEE Trans Ind Inf 16(4):2659–2666 Liu Z, Wang L, Wu W, Qian C, Lu T (2021) TAM: temporal adaptive module for video recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 13708–13718 Sevilla-Lara L, Liao Y, Güney F, Jampani V, Geiger A, Black M.J (2019) On the integration of optical flow and action recognition. In: Pattern recognition: 40th German conference, GCPR 2018, Stuttgart, Germany, October 9–12, 2018, Proceedings 40. Springer, pp 281–297 Jiang B, Wang M, Gan W, Wu W, Yan J (2019) Stm: Spatiotemporal and motion encoding for action recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2000–2009 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 Chai Y, Du L, Qiu J, Yin L, Tian Z (2022) Dynamic prototype network based on sample adaptation for few-shot malware detection. IEEE Trans Knowl Data Eng Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) ECA-NET: efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11534–11542 Zhang Y, Cao C, Cheng J, Lu H (2018) Egogesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans Multimed 20(5):1038–1050 Materzynska J, Berger G, Bax I, Memisevic R (2019) The jester dataset: a large-scale video dataset of human gestures. In: Proceedings of the IEEE/CVF international conference on computer vision workshops Lee M, Lee S, Son S, Park G, Kwak N (2018) Motion feature network: fixed motion filter for action recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 387–403 Han X, Lu F, Yin J, Tian G, Liu J (2022) Sign language recognition based on r (2+ 1) d with spatial-temporal-channel attention. IEEE Trans Hum-Mach Syst 52(4):687–698 Liu M, Zhang Y (2022) GMNET: an action recognition network with global motion representation. Int J Mach Learn Cybern 1–11 Jiang Z, Zhang Y, Hu S (2023) ESTI: an action recognition network with enhanced spatio-temporal information. Int J Mach Learn Cybern 1–12 Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500 Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520 Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) GRAD-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626