Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo

Tạo đồ thị cảnh không gian-thời gian dựa trên video với các nhiệm vụ tự giám sát hiệu quả

Multimedia Tools and Applications - Tập 82 - Trang 38947-38966 - 2023

Lianggangxu Chen^1,2, Yiqing Cai¹, Changhong Lu³, Changbo Wang¹, Gaoqi He^1,2

¹Chongqing Key Laboratory of Precision Optics, Chongqing Institute of East China Normal University, Chongqing, China

²School of Computer Science and Technology, East China Normal University, Shanghai, China

³School of Mathematical Sciences, East China Normal University, Shanghai, China

Tóm tắt

Tạo Đồ Thị Cảnh Không Gian-Thời Gian (STSGG) nhằm mục đích trích xuất một chuỗi biểu diễn ngữ nghĩa dựa trên đồ thị cho các nhiệm vụ trực quan cấp cao. Các công trình hiện tại thường không khai thác tốt mối tương quan thời gian mạnh mẽ và các chi tiết về đặc điểm cục bộ, điều này dẫn đến việc không thể phân biệt hành động giữa quan hệ động (ví dụ: uống) và quan hệ tĩnh (ví dụ: cầm). Hơn nữa, do định kiến dài đuôi kém, các kết quả dự đoán gặp khó khăn với việc phân loại các quy predicate ở đuôi không chính xác. Để giải quyết các vấn đề này, một Mạng Chậm-Nhanh Nhận Thức Địa Phương (SFLA) được đề xuất cho mô hình hóa thời gian trong STSGG. Đầu tiên, một mạng nhánh đôi được sử dụng để trích xuất các đặc điểm quan hệ tĩnh và động tương ứng. Thứ hai, một module Nhận Thức Địa Phương (LRA) được đề xuất để gán tầm quan trọng lớn hơn cho các yếu tố quan trọng trong các mối quan hệ cục bộ. Thứ ba, ba nhiệm vụ tự giám sát mới được đề xuất, đó là, vị trí không gian, trạng thái chú ý của con người và biến đổi khoảng cách. Các nhiệm vụ tự giám sát này được đào tạo đồng thời với mô hình chính để giảm thiểu vấn đề định kiến dài đuôi và tăng cường sự phân biệt đặc điểm. Các thí nghiệm có hệ thống cho thấy phương pháp của chúng tôi đạt được hiệu suất tốt nhất trong bộ dữ liệu Action Genome (AG) được đề xuất gần đây và bộ dữ liệu Video ImageNet phổ biến.

Từ khóa

#Tạo đồ thị không gian-thời gian #Mạng Chậm-Nhanh Nhận Thức Địa Phương #Tự giám sát #Phân tích hành động #Mô hình hóa thời gian

Tài liệu tham khảo

Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 6299–6308. Institute of Electrical and Electronics Engineers (IEEE), Piscataway Chen VS, Varma P, Krishna R et al (2019) Scene graph prediction with limited labels. In: Proceedings of the IEEE international conference on computer vision, pp 2580–2590. Institute of Electrical and Electronics Engineers (IEEE), Piscataway Chen L, Wang G, Hou G (2021) Multi-scale and multi-column convolutional neural network for crowd density estimation. Multimed Tools Appl 80 (5):6661–6674 Chen Y, Wang Y, Zhang Y, et al (2019) Panet: a context based predicate association network for scene graph generation. In: 2019 IEEE international conference on multimedia and expo (ICME), IEEE, pp 508–513 Feichtenhofer C, Fan H, Malik J et al (2019) Slowfast networks for video recognition. In: Proceedings of the IEEE international conference on computer vision, pp 6202–6211. Institute of Electrical and Electronics Engineers (IEEE), Piscataway Gao K, Chen L, Huang Y et al (2021) Video relation detection via tracklet based visual transformer. In: Proceedings of the 29th ACM international conference on multimedia, pp 4833–4837. ACM MM Association for Computing Machinery (ACM), New York Geng S, Gao P, Hori C et al (2020) Spatio-temporal scene graphs for video dialog. arXiv:2007.04365, 2007 Gu C, Sun C, Ross DA et al (2018) Ava: A video dataset of spatio-temporally localized atomic visual actions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6047–6056. Institute of Electrical and Electronics Engineers (IEEE), Piscataway He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778. Institute of Electrical and Electronics Engineers (IEEE), Piscataway Ji J, Krishna R, Fei-Fei L et al (2020) Action genome: Actions as compositions of spatio-temporal scene graphs. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10236–10247. Institute of Electrical and Electronics Engineers (IEEE), Piscataway Johnson J, Gupta A, Fei-Fei L (2018) Image generation from scene graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1219–1228. Institute of Electrical and Electronics Engineers (IEEE), Piscataway Krizhevsky A, Sutskever I, Hinton G E (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90 Li Y, Ouyang W, Zhou B et al (2017) Scene graph generation from objects, phrases and region captions. In: Proceedings of the IEEE international conference on computer vision, pp 1261–1270. Institute of Electrical and Electronics Engineers (IEEE), Piscataway Li R, Zhang S, Wan B et al (2021) Bipartite graph network with adaptive message passing for unbiased scene graph generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11109–11119. Institute of Electrical and Electronics Engineers (IEEE), Piscataway Lin TY, Maire M, Belongie S et al (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, Springer, pp 740–755 Liu C, Jin Y, Xu K et al (2020) Beyond short-term snippet: Video relation detection with spatio-temporal global context. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10840–10849. Institute of Electrical and Electronics Engineers (IEEE), Piscataway Lu C, Krishna R, Bernstein M et al (2016) Visual relationship detection with language priors. In: European conference on computer vision, Springer, pp 852–869 Lyu F, Feng W, Wang S (2020) vtgraphnet: Learning weakly-supervised scene graph for complex visual grounding. Neurocomputing 413:51–60 Mi L, Chen Z (2020) Hierarchical graph attention network for visual relationship detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13886–13895. Institute of Electrical and Electronics Engineers (IEEE), Piscataway Peyre J, Sivic J, Laptev I et al (2017) Weakly-supervised learning of visual relations. In: Proceedings of the ieee international conference on computer vision, pp 5179–5188. Institute of Electrical and Electronics Engineers (IEEE), Piscataway Qian X, Zhuang Y, Li Y et al (2019) Video relation detection with spatio-temporal graph. In: Proceedings of the 27th ACM international conference on multimedia, pp 84–93. ACM MM Association for Computing Machinery (ACM), New York Ren S, He K, Girshick R et al (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99. Neural Information Processing Systems Foundation, Cupertino Shang X, Ren T, Guo J et al (2017) Video visual relation detection. In: Proceedings of the 25th ACM international conference on Multimedia, pp 1300–1308. ACM MM Association for Computing Machinery (ACM), New York Shen K, Wu L, Xu F et al (2020) Hierarchical attention based spatial-temporal graph-to-sequence learning for grounded video description. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI), pp 3406–3412 Sigurdsson GA, Varol G, Wang X et al (2016) Hollywood in homes: Crowdsourcing data collection for activity understanding. In: European conference on computer vision, Springer, pp 510–526 Tang K, Niu Y, Huang J et al (2020) Unbiased scene graph generation from biased training. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3716–3725. Institute of Electrical and Electronics Engineers (IEEE), Piscataway Tang K, Zhang H, Wu B et al (2019) Learning to compose dynamic tree structures for visual contexts. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6619–6628. Institute of Electrical and Electronics Engineers (IEEE), Piscataway Trojahn T H, Goularte R (2021) Temporal video scene segmentation using deep-learning. Multimed Tools Appl 80(12):17487–17513 Tsai YHH, Divvala S, Morency LP et al (2019) Video relationship reasoning using gated spatio-temporal energy graph. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10424–10433. Institute of Electrical and Electronics Engineers (IEEE), Piscataway Wang X, Gupta A (2018) Videos as space-time region graphs. In: Proceedings of the European conference on computer vision (ECCV), pp 399–417. Springer Science+Business Media, New York Wang R, Wei Z, Li P et al (2020) Storytelling from an image stream using scene graphs. In: AAAI, pp 9185–9192. AAAI, Palo Alto Xu D, Zhu Y, Choy CB et al (2017) Scene graph generation by iterative message passing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5410–5419. Institute of Electrical and Electronics Engineers (IEEE), Piscataway Yan S, Shen C, Jin Z et al (2020) Pcpl: Predicate-correlation perception learning for unbiased scene graph generation. In: Proceedings of the 28th ACM international conference on multimedia, pp 265–273. ACM MM Association for Computing Machinery (ACM), New York Yang J, Lu J, Lee S et al (2018) Graph r-cnn for scene graph generation. In: Proceedings of the European conference on computer vision (ECCV), pp 670–685. Springer Science+Business Media, New York Zareian A, Karaman S, Chang SF (2020a) Bridging knowledge graphs to generate scene graphs. arXiv:2001.02314 Zareian A, Karaman S, Chang SF (2020b) Weakly supervised visual semantic parsing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3736–3745. Institute of Electrical and Electronics Engineers (IEEE), Piscataway Zellers R, Yatskar M, Thomson S et al (2018) Neural motifs: Scene graph parsing with global context. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5831–5840. Institute of Electrical and Electronics Engineers (IEEE), Piscataway Zhang J, Shih KJ, Elgammal A et al (2019) Graphical contrastive losses for scene graph parsing. In: Proceedings of the IEEE conference on computer vision and pattern Recognition, pp 11535–11543. Institute of Electrical and Electronics Engineers (IEEE), Piscataway

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA