Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo
LM-Net: Mạng nhận diện cử chỉ động với tập hợp lâu dài và kích thích chuyển động
International Journal of Machine Learning and Cybernetics - Trang 1-13 - 2023
Tóm tắt
Trong những năm gần đây, đã có sự quan tâm ngày càng tăng về các cử chỉ tay động như một phương thức tự nhiên trong tương tác giữa người và máy. Tuy nhiên, các phương pháp hiện có cho việc nhận diện cử chỉ động vẫn có một số hạn chế, đặc biệt là trong việc liên tục nắm bắt và tập trung vào khu vực chuyển động của tay qua các mô hình chuyển động khác nhau. Bài báo nghiên cứu này giới thiệu LMNet, một mạng lưới sáng tạo và hiệu quả bao gồm Mô-đun Tập hợp Dài hạn và Mô-đun Kích thích Chuyển động. Mô-đun Kích thích Chuyển động khai thác thông tin chuyển động được trích xuất từ các khung kế cận nhằm tăng cường các kênh nhạy cảm với chuyển động, trong khi Mô-đun Tập hợp Dài hạn sử dụng sự tích chập động để tiếp thu thông tin tạm thời từ các mô hình chuyển động đa dạng. Các thí nghiệm nghiêm ngặt được thực hiện trên các tập dữ liệu EgoGesture và Jester cho thấy LMNet vượt trội hơn hầu hết các phương pháp hiện hành về độ chính xác, đồng thời duy trì một chi phí tính toán tối ưu.
Từ khóa
#cử chỉ động #nhận diện cử chỉ #tương tác người-máy #LMNet #Mô-đun Tập hợp Dài hạn #Mô-đun Kích thích Chuyển độngTài liệu tham khảo
Tu Z, Huang Z, Chen Y, Kang D, Bao L, Yang B, Yuan J (2023) Consistent 3d hand reconstruction in video via self-supervised learning. IEEE Trans Pattern Anal Mach Intell 45(8):9469–9485
Wei Z, Zeyi L, Jian C, Mingyu K, Xiaoming D, Hongan W (2021) Survey of dynamic hand gesture understanding and interaction. J Softw 32(10):3051–3067
Yuanyuan S, Yunan L, Xiaolong F, Kaibin M, Qiguang M (2021) Review of dynamic gesture recognition. Virtual Real Intell Hardw 3(3):183–206
Parcheta Z, Martínez-Hinarejos C-D (2017) Sign language gesture recognition using HMM. In: Pattern recognition and image analysis: 8th Iberian conference, IbPRIA 2017, Faro, Portugal, June 20–23, 2017, Proceedings, vol 8. Springer, pp 419–426
Carmona J.M, Climent J (2012) A performance evaluation of HMM and DTW for gesture recognition. In: Progress in pattern recognition, image analysis, computer vision, and applications: 17th Iberoamerican congress, CIARP 2012, Buenos Aires, Argentina, September 3–6, 2012. Proceedings 17. Springer, pp 236–243
Moshayedi AJ, Roy AS, Kolahdooz A, Shuxin Y (2022) Deep learning application pros and cons over algorithm. EAI Endorsed Trans AI Robot 1(1):7–7
Uddin NMI, Moshayedi AJ, Shuxin Y et al (2022) The face detection/recognition, perspective and obstacles in robotic: a review. EAI Endorsed Trans AI Robot 1(1):14–14
Xu G, Khan AS, Moshayedi AJ, Zhang X, Shuxin Y (2022) The object detection, perspective and obstacles in robotic: a review. EAI Endorsed Trans AI Robot 1(1):13
Moshayedi AJ, Roy AS, Taravet A, Liao L, Wu J, Gheisari M (2023) A secure traffic police remote sensing approach via a deep learning-based low-altitude vehicle speed detector through uavs in smart cites: algorithm, implementation and evaluation. Future Transp 3(1):189–209
Lai K, Yanushkevich SN (2018) CNN+ RNN depth and skeleton based dynamic hand gesture recognition. In: 2018 24th International conference on pattern recognition (ICPR). IEEE, pp 3451–3456
Basnin N, Nahar L, Hossain MS (2021) An integrated CNN-LSTM model for micro hand gesture recognition. In: Intelligent computing and optimization: proceedings of the 3rd international conference on intelligent computing and optimization 2020 (ICO 2020). Springer, pp 379–392
Tsironi E, Barros P, Weber C, Wermter S (2017) An analysis of convolutional long short-term memory recurrent neural networks for gesture recognition. Neurocomputing 268:76–86
Liu X, Lee J-Y, Jin H (2019) Learning video representations from correspondence proposals. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4273–4281
Li Y, Ji B, Shi X, Zhang J, Kang B, Wang L (2020) TEA: Temporal excitation and aggregation for action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 909–918
Wang L, Tong Z, Ji B, Wu G (2021) TDN: Temporal difference networks for efficient action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1895–1904
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497
Köpüklü O, Gunduz A, Kose N, Rigoll G (2019) Real-time hand gesture detection and classification using convolutional neural networks. In: 2019 14th IEEE international conference on automatic face & gesture recognition (FG 2019). IEEE, pp 1–8
Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M (2018) A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6450–6459
Liu Z, Ning J, Cao Y, Wei Y, Zhang Z, Lin S, Hu H (2022) Video swin transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3202–3211
Bertasius G, Wang H, Torresani L (2021) Is space-time attention all you need for video understanding? In: ICML, vol 2, p 4
Beauchemin SS, Barron JL (1995) The computation of optical flow. ACM Comput Surv 27(3):433–466
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, vol 27
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2016) Temporal segment networks: towards good practices for deep action recognition. In: European conference on computer vision. Springer, pp 20–36
Lin J, Gan C, Han S (2019) TSM: temporal shift module for efficient video understanding. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7083–7093
Wang Z, She Q, Smolic A (2021) Action-net: Multipath excitation for action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13214–13223
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Tu Z, Zhang J, Li H, Chen Y, Yuan J (2023) Joint-bone fusion graph convolutional network for semi-supervised skeleton action recognition. IEEE Trans Multimed 25:1819–1831. https://doi.org/10.1109/TMM.2022.3168137
Qiu J, Du L, Zhang D, Su S, Tian Z (2019) NEI-TTE: intelligent traffic time estimation based on fine-grained time derivation of road segments for smart city. IEEE Trans Ind Inf 16(4):2659–2666
Liu Z, Wang L, Wu W, Qian C, Lu T (2021) TAM: temporal adaptive module for video recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 13708–13718
Sevilla-Lara L, Liao Y, Güney F, Jampani V, Geiger A, Black M.J (2019) On the integration of optical flow and action recognition. In: Pattern recognition: 40th German conference, GCPR 2018, Stuttgart, Germany, October 9–12, 2018, Proceedings 40. Springer, pp 281–297
Jiang B, Wang M, Gan W, Wu W, Yan J (2019) Stm: Spatiotemporal and motion encoding for action recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2000–2009
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Chai Y, Du L, Qiu J, Yin L, Tian Z (2022) Dynamic prototype network based on sample adaptation for few-shot malware detection. IEEE Trans Knowl Data Eng
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) ECA-NET: efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11534–11542
Zhang Y, Cao C, Cheng J, Lu H (2018) Egogesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans Multimed 20(5):1038–1050
Materzynska J, Berger G, Bax I, Memisevic R (2019) The jester dataset: a large-scale video dataset of human gestures. In: Proceedings of the IEEE/CVF international conference on computer vision workshops
Lee M, Lee S, Son S, Park G, Kwak N (2018) Motion feature network: fixed motion filter for action recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 387–403
Han X, Lu F, Yin J, Tian G, Liu J (2022) Sign language recognition based on r (2+ 1) d with spatial-temporal-channel attention. IEEE Trans Hum-Mach Syst 52(4):687–698
Liu M, Zhang Y (2022) GMNET: an action recognition network with global motion representation. Int J Mach Learn Cybern 1–11
Jiang Z, Zhang Y, Hu S (2023) ESTI: an action recognition network with enhanced spatio-temporal information. Int J Mach Learn Cybern 1–12
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) GRAD-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
