TGLC: Theo dõi đối tượng thị giác bằng cách tích hợp thông tin toàn cầu-cục bộ và thông tin kênh

Shuo Zhang1, Dan Zhang2, Qi Zou1
1Lassonde School of Engineering, York University, Toronto, Canada
2Department of Mechanical Engineering, The Hong Kong Polytechnic University, Kowloon, Hong Kong

Tóm tắt

Theo dõi đối tượng thị giác hướng tới việc xác định vị trí mục tiêu liên tục trong từng khung hình với vị trí mục tiêu ban đầu được chỉ định, đây là một nhiệm vụ thiết yếu nhưng đầy thách thức trong lĩnh vực thị giác máy tính. Các phương pháp gần đây cố gắng kết hợp thông tin toàn cầu từ mẫu và khu vực tìm kiếm để theo dõi đối tượng, đạt được hiệu suất theo dõi đầy hứa hẹn. Tuy nhiên, việc kết hợp thông tin toàn cầu đã phá hủy một số chi tiết cục bộ. Thông tin cục bộ là rất cần thiết để phân biệt mục tiêu với các vùng nền. Với trọng tâm giải quyết vấn đề này, công trình này trình bày một thuật toán theo dõi mới TGLC tích hợp một khối tích chập nhạy kênh và sự chú ý Transformer để hợp nhất đại diện toàn cầu và cục bộ, cũng như cho mô hình hóa thông tin kênh. Phương pháp này có khả năng ước lượng chính xác hộp giới hạn của mục tiêu. Nhiều thí nghiệm đã được thực hiện trên năm tập dữ liệu nổi tiếng, cụ thể là GOT-10k, TrackingNet, LaSOT, OTB100 và UAV123. Kết quả cho thấy phương pháp theo dõi đề xuất đạt được hiệu suất cạnh tranh so với các bộ theo dõi tiên tiến nhất trong khi vẫn chạy với tốc độ thời gian thực. Việc hình dung các kết quả theo dõi trên LaSOT còn chứng tỏ khả năng của phương pháp theo dõi đề xuất trong việc đối phó với các thách thức trong việc theo dõi, chẳng hạn như sự biến đổi ánh sáng, biến dạng của mục tiêu và sự lộn xộn của phông nền.

Từ khóa

#theo dõi đối tượng #thị giác máy tính #thông tin toàn cầu #thông tin cục bộ #khối tích chập nhạy kênh #sự chú ý Transformer

Tài liệu tham khảo

Hsu CC, Kang LW, Chen SY, Wang IS, Hong CH, Chang CY (2023) Deep learning-based vehicle trajectory prediction based on generative adversarial network for autonomous driving applications. Multimed Tools Appl 82(7):10763–10780 Čegovnik T, Stojmenova K, Tartalja I, Sodnik J (2020) Evaluation of different interface designs for human-machine interaction in vehicles. Multimed Tools Appl 79:21361–21388 Tyagi B, Nigam S, Singh R (2022) A review of deep learning techniques for crowd behavior analysis. Arch Comput Methods Eng 29(7):5427–5455 Nigam S, Singh R, Misra AK (2019) A review of computational approaches for human behavior detection. Arch Comput Methods Eng 26:831–863 Singh R, Nigam S, Singh AK, Elhoseny M (2020) Intelligent wavelet based techniques for advanced multimedia applications. Springer International Publishing, Cham Chen Z, Hong Z, Tao D (2015) An experimental survey on correlation filter-based tracking. arXiv preprint arXiv:150905520 Nigam S, Khare A (2010) Curvelet transform based object tracking. In: 2010 international conference on computer and communication technology (ICCCT), pp 230–235 Nigam S, Khare A (2012) Curvelet transform-based technique for tracking of moving objects. IET Comput Vis 6(3):231–251 Kwak S, Nam W, Han B, Han JH (2011) Learning occlusion with likelihoods for visual tracking. In: 2011 international conference on computer vision, pp 1551–1558 Vojir T, Noskova J, Matas J (2014) Robust scale-adaptive mean-shift for tracking. Pattern Recogn Lett 49:250–258 Hare S, Golodetz S, Saffari A, Vineet V, Cheng MM, Hicks SL, Torr PH (2015) Struck: structured output tracking with kernels. IEEE Trans Pattern Anal Mach Intell 38(10):2096–2109 Kalal Z, Mikolajczyk K, Matas J (2011) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell 34(7):1409–1422 Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional Siamese networks for object tracking. In: Computer vision–the European conference on computer vision 2016 workshops, pp 850–865 Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8971–8980 Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware Siamese networks for visual object tracking. In: Proceedings of the European conference on computer vision, pp 101–117 Wang Q, Zhang L, Bertinetto L, Hu W, Torr PH (2019) Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1328–1338 Wang Z, Xu J, Liu L, Zhu F, Shao L (2019) Ranet: ranking attention network for fast video object segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3978–3987 Yan B, Zhang X, Wang D, Lu H, Yang X (2021) Alpha-refine: boosting tracking performance by precise bounding box estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5289–5298 Chen X, Yan B, Zhu J, Wang D, Yang X, Lu H (2021) Transformer tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8126–8135 Yan B, Peng H, Fu J, Wang D, Lu H (2021) Learning spatio-temporal transformer for visual tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10448–10457 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Proces Syst 30 Huang L, Zhao X, Huang K (2019) Got-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans Pattern Anal Mach Intell 43(5):1562–1577 Muller M, Bibi A, Giancola S, Alsubaihi S, Ghanem B (2018) Trackingnet: a large-scale dataset and benchmark for object tracking in the wild. In: Proceedings of the European conference on computer vision, pp 300–317 Fan H, Lin L, Yang F, Chu P, Deng G, Yu S, Bai H, Xu Y, Liao C, Ling H (2019) Lasot: a high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5374–5383 Wu Y, Lim J, Yang MH (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848 Mueller M, Smith N, Ghanem B (2016) A benchmark and simulator for uav tracking. In: Proceedings of the European conference on computer vision, pp 445–461 Srinivas A, Lin TY, Parmar N, Shlens J, Abbeel P, Vaswani A (2021) Bottleneck transformers for visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16519–16529 Xu W, Xu Y, Chang T, Tu Z (2021) Co-scale conv-attentional image transformers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9981–9990 Dai Z, Liu H, Le QV, Tan M (2021) Coatnet: Marrying convolution and attention for all data sizes. Adv Neural Inf Proces Syst 34:3965–3977 Peng Z, Huang W, Gu S, Xie L, Wang Y, Jiao J, Ye Q (2021) Conformer: local features coupling global representations for visual recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 367–376 Mehta S, Rastegari M (2022) Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer. In: International conference on learning representations Woo S, Park J, Lee JY, Kweon IS (2018) CBAM: Convolutional block attention module. In: Proceedings of the European conference on computer vision, pp 3–19 Hendria WF, Phan QT, Adzaka F, Jeong C (2023) Combining transformer and CNN for object detection in UAV imagery. ICT Express 9(2):258–263 Zhang Y, Chen Y, Huang C, Gao M (2019) Object detection network based on feature fusion and attention mechanism. Future Internet 11(1):9 Pandey D, Gupta P, Bhattacharya S, Sinha A, Agarwal R (2021) Transformer assisted convolutional network for cell instance segmentation. arXiv preprint arXiv:2110.02270 Petit O, Thome N, Rambour C, Themyr L, Collins T, Soler L (2021) U-net transformer: self and cross attention for medical image segmentation. In: Machine learning in medical imaging: 12th international workshop, pp 267–276 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision, pp 213–229 Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141 Ma Z, Wang L, Zhang H, Lu W, Yin J (2020) RPT: learning point set representation for Siamese visual tracking. In: Computer vision–European conference on computer vision 2020 workshops, pp 653–665 Yang Z, Liu S, Hu H, Wang L, Lin S (2019) Reppoints: point set representation for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9657–9666 Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 658–666 Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision, pp 740–755 Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252 Loshchilov I, Hutter F (2019) Decoupled weight decay regularization. In: International conference on learning representations Guo D, Shao Y, Cui Y, Wang Z, Zhang L, Shen C (2021) Graph attention tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9543–9552 Yu H, Zhu P, Zhang K, Wang Y, Zhao S, Wang L, Zhang T, Hu Q (2022) Learning dynamic compact memory embedding for deformable visual object tracking. IEEE Trans Neural Netw Learn Syst Zhang Z, Peng H, Fu J, Li B, Hu W (2020) Ocean: object-aware anchor-free tracking. In: European conference on computer vision, pp 771–787 Xu Y, Wang Z, Li Z, Yuan Y, Yu G (2020) Siamfc++: towards robust and accurate visual tracking with target estimation guidelines. Proc AAAI Conf Artif Intell 34(07):12549–12556 Bhat G, Danelljan M, Gool LV, Timofte R (2020) Know your surroundings: exploiting scene information for object tracking. In: European conference on computer vision, pp 205–221 Zheng L, Tang M, Chen Y, Wang J, Lu H (2020) Learning feature embeddings for discriminant model based tracking. In: European conference on computer vision, pp 759–775 Danelljan M, Gool LV, Timofte R (2020) Probabilistic regression for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7183–7192 Lukezic A, Matas J, Kristan M (2020) D3s-a discriminative single shot segmentation tracker. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7133–7142 Bhat G, Danelljan M, Gool LV, Timofte R (2019) Learning discriminative model prediction for tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6182–6191 Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn++: evolution of Siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4282–4291 Ma F, Shou MZ, Zhu L, Fan H, Xu Y, Yang Y, Yan Z (2022) Unified transformer tracker for object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8781–8790 Zhang H, Zhang Z, Zhang J, Zhao Y, Gao M (2023) Online bionic visual Siamese tracking based on mixed time-event triggering mechanism. Multimed Tools Appl 82(10):15199–15222 Javed S, Mahmood A, Ullah I, Bouwmans T, Khonji M, Dias JMM, Werghi N (2022) A novel algorithm based on a common subspace fusion for visual object tracking. IEEE Access 10:24690–24703 Zhang H, Liang J, Zhang J, Zhang T, Lin Y, Wang Y (2023) Attention-driven memory network for online visual tracking. IEEE Trans Neural Netw Learn Syst Liu J, Wang Y, Huang X, Su Y (2022) Tracking by dynamic template: dual update mechanism. J Vis Commun Image Represent 84:103456 Wang J, Zhang H, Zhang J, Miao M, Zhang J (2022) Dual-branch memory network for visual object tracking. In: Chinese conference on pattern recognition and computer vision, pp 646–658 Yu Y, Xiong Y, Huang W, Scott MR (2020) Deformable Siamese attention networks for visual object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6728–6737 Yang K, Zhang H, Zhou D, Liu L (2021) TGAN: a simple model update strategy for visual tracking via template-guidance attention network. Neural Netw 144:61–74 Du F, Liu P, Zhao W, Tang X (2020) Correlation-guided attention for corner detection based visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6836–6845 Danelljan M, Bhat G, Khan FS, Felsberg M (2019) Atom: accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4660–4669 Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6668–6677 Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M (2017) Eco: efficient convolution operators for tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6638–6646 Danelljan M, Robinson A, Shahbaz Khan F, Felsberg M (2016) Beyond correlation filters: learning continuous convolution operators for visual tracking. In: European conference on computer vision, pp 472–488 Li P, Chen B, Ouyang W, Wang D, Yang X, Lu H (2019) Gradnet: gradient-guided network for visual object tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6162–6171 Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Convolutional features for correlation filter based visual tracking. In: Proceedings of the IEEE international conference on computer vision workshops, pp 58–66 Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr PH (2017) End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2805–2813 Guo D, Wang J, Cui Y, Wang Z, Chen S (2020) SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6269–6277 Huang L, Zhao X, Huang K (2020) Globaltrack: a simple and strong baseline for long-term tracking. Proc AAAI Conf Artif Intell 34(07):11037–11044 Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022