Exploring fusion strategies for accurate RGBT visual object tracking

Information Fusion - Tập 99 - Trang 101881 - 2023
Zhangyong Tang1, Tianyang Xu1, Hui Li1, Xiao-Jun Wu1, XueFeng Zhu1, Josef Kittler2
1Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, School of Artificial Intelligence and Computer Science, Jiangnan University, 214122, Wuxi, China
2The Center for Vision, Speech and Signal Processing, University of Surrey, Guildford, GU2 7XH, UK

Tài liệu tham khảo

Li, 2023, Characteristic evaluation via multi-sensor information fusion strategy for spherical underwater robots, Inf. Fusion, 95, 199, 10.1016/j.inffus.2023.02.024 De-la-Torre, 2015, Partially-supervised learning from facial trajectories for face recognition in video surveillance, Inf. Fusion, 24, 31, 10.1016/j.inffus.2014.05.006 Song, 2013, A novel dynamic model for multiple pedestrians tracking in extremely crowded scenarios, Inf. Fusion, 14, 301, 10.1016/j.inffus.2012.08.004 Liu, 2022, Learning dual-level deep representation for thermal infrared tracking, IEEE Trans. Multimed., 1, 10.1109/TMM.2022.3197364 M. Kristan, J. Matas, A. Leonardis, et al., The Seventh Visual Object Tracking VOT2019 Challenge Results, in: 2019 IEEE/CVF International Conference on Computer Vision Workshop, ICCVW, 2019, pp. 2206–2241. B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, J. Yan, SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks, in: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2019, pp. 4277–4286. L. Zhang, M. Danelljan, A. Gonzalez-Garcia, J. van de Weijer, F. Shahbaz Khan, Multi-Modal Fusion for End-to-End RGB-T Tracking, in: 2019 IEEE/CVF International Conference on Computer Vision Workshop, ICCVW, 2019, pp. 2252–2261. C.L. Li, A. Lu, A.H. Zheng, Z. Tu, J. Tang, Multi-Adapter RGBT Tracking, in: 2019 IEEE/CVF International Conference on Computer Vision Workshop, ICCVW, 2019, pp. 2262–2270. Y. Xiao, M. Yang, C. Li, L. Liu, J. Tang, Attribute-based progressive fusion network for RGBT tracking, in: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 36, No. 3, 2022, pp. 2831–2838. Li, 2019, RGB-T object tracking: Benchmark and baseline, Pattern Recognit., 96, 10.1016/j.patcog.2019.106977 Li, 2016, Learning collaborative sparse representation for grayscale-thermal tracking, IEEE Trans. Image Process., 25, 5743, 10.1109/TIP.2016.2614135 Torabi, 2012, An iterative integrated framework for thermal-visible image registration, sensor fusion, and people tracking for video surveillance applications, Comput. Vis. Image Underst., 116, 210, 10.1016/j.cviu.2011.10.006 Davis, 2007, Background-subtraction using contour-based fusion of thermal and visible imagery, Comput. Vis. Image Underst., 106, 162, 10.1016/j.cviu.2006.06.010 Li, 2022, LasHeR: A large-scale high-diversity benchmark for RGBT tracking, IEEE Trans. Image Process., 31, 392, 10.1109/TIP.2021.3130533 P. Zhang, J. Zhao, D. Wang, H. Lu, X. Ruan, Visible-thermal UAV tracking: A large-scale benchmark and new baseline, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 8886–8895. Feng, 2020, Learning discriminative update adaptive spatial-temporal regularized correlation filter for RGB-T tracking, J. Vis. Commun. Image Represent., 72, 10.1016/j.jvcir.2020.102881 Xu, 2019, Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual object tracking, IEEE Trans. Image Process., 28, 5596, 10.1109/TIP.2019.2919201 T. Xu, Z. Feng, X. Wu, J. Kittler, Joint Group Feature Selection and Discriminative Filter Learning for Robust Visual Object Tracking, in: 2019 IEEE/CVF International Conference on Computer Vision, ICCV, 2019, pp. 7949–7959. T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, C.L. Zitnick, Microsoft COCO: Common Objects in Context, in: European Conference on Computer Vision, ECCV, 2014, pp. 740–755. N. Xu, L. Yang, Y. Fan, D. Yue, Y. Liang, J. Yang, T.S. Huang, Youtube-VOS: Sequence-to-Sequence Video Object Segmentation, in: European Conference on Computer Vision, 2018, pp. 603–619. M. Kristan, A. Leonardis, J. Matas, et al., The eighth visual object tracking VOT2020 challenge results, in: Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, 2020, pp. 547–601. Li, 2013, Image fusion with guided filtering, IEEE Trans. Image Process., 22, 2864, 10.1109/TIP.2013.2244222 K. Ram Prabhakar, V. Sai Srikar, R. Venkatesh Babu, Deepfuse: A deep unsupervised approach for exposure fusion with extreme exposure image pairs, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 4714–4722. Xu, 2022, U2Fusion: A unified unsupervised image fusion network, IEEE Trans. Pattern Anal. Mach. Intell., 44, 502, 10.1109/TPAMI.2020.3012548 Cheng, 2021, UNIFusion: A lightweight unified image fusion network, IEEE Trans. Instrum. Meas., 70, 1 Li, 2020, MDLatLRR: A novel decomposition method for infrared and visible image fusion, IEEE Trans. Image Process., 29, 4733, 10.1109/TIP.2020.2975984 Bhat, 2019, Learning discriminative model prediction for tracking, 6181 Q. Wang, L. Zhang, L. Bertinetto, W. Hu, P.H.S. Torr, Fast Online Object Tracking and Segmentation: A Unifying Approach, in: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2019, pp. 1328–1338. Xu, 2023, Toward robust visual object tracking with independent target-agnostic detection and effective siamese cross-task interaction, IEEE Trans. Image Process., 32, 1541, 10.1109/TIP.2023.3246800 L. Bertinetto, J. Valmadre, J.F. Henriques, A. Vedaldi, P.H.S. Torr, Fully-Convolutional Siamese Networks for Object Tracking, in: European Conference on Computer Vision Workshops, ECCVW, 2016, pp. 850–865. Ren, 2017, Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE Trans. Pattern Anal. Mach. Intell., 39, 1137, 10.1109/TPAMI.2016.2577031 B. Li, J. Yan, W. Wu, Z. Zhu, X. Hu, High Performance Visual Tracking with Siamese Region Proposal Network, in: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2018, pp. 8971–8980. H. Fan, H. Ling, Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking, in: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2019, pp. 7944–7953. K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image Recognition, in: 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2016, pp. 770–778. Li, 2019, Hierarchical spatial-aware siamese network for thermal infrared object tracking, Knowl.-Based Syst., 166, 71, 10.1016/j.knosys.2018.12.011 Q. Liu, X. Li, Z. He, N. Fan, D. Yuan, W. Liu, Y. Liang, Multi-Task Driven Feature Models for Thermal Infrared Tracking, in: Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020, pp. 11604–11611. M. Felsberg, A. Berg, G. Hager, et al., The Thermal Infrared Visual Object Tracking VOT-TIR2015 Challenge Results, in: 2015 IEEE/CVF International Conference on Computer Vision Workshop, ICCVW, 2015, pp. 639–651. M. Danelljan, G. Häger, F.S. Khan, M. Felsberg, Learning Spatially Regularized Correlation Filters for Visual Tracking, in: 2015 IEEE/CVF International Conference on Computer Vision, ICCV, 2015, pp. 4310–4318. G. Zhu, F. Porikli, H. Li, Beyond Local Search: Tracking Objects Everywhere with Instance-Specific Proposals, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2016, pp. 943–951. Yu, 2017, Dense structural learning for infrared object tracking at 200+ frames per second, Pattern Recognit. Lett., 100, 152, 10.1016/j.patrec.2017.10.026 N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: 2005 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2005, pp. 886–893. Zhang, 2019, Synthetic data generation for end-to-end thermal infrared tracking, IEEE Trans. Image Process., 28, 1837, 10.1109/TIP.2018.2879249 M. Danelljan, G. Bhat, F.S. Khan, M. Felsberg, ECO: Efficient Convolution Operators for Tracking, in: 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2017, pp. 6931–6939. Goodfellow, 2014, Generative adversarial networks, Adv. Neural Inf. Process. Syst., 3, 2672 Liu, 2020, Learning deep multi-level similarity for thermal infrared object tracking, IEEE Trans. Multimed., 23, 2114, 10.1109/TMM.2020.3008028 Cheng, 2023, MUFusion: A general unsupervised image fusion network based on memory unit, Inf. Fusion, 92, 80, 10.1016/j.inffus.2022.11.010 Li, 2023 Hu, 2023, ZMFF: Zero-shot multi-focus image fusion, Inf. Fusion, 92, 127, 10.1016/j.inffus.2022.11.014 Radford, 2021, Learning transferable visual models from natural language supervision, 8748 Zhang, 2020, Object fusion tracking based on visible and infrared images: A comprehensive review, Inf. Fusion, 63, 166, 10.1016/j.inffus.2020.05.002 X.-F. Zhu, T. Xu, Z. Tang, Z. Wu, H. Liu, X. Yang, X.-J. Wu, J. Kittler, RGBD1K: A Large-scale Dataset and Benchmark for RGB-D Object Tracking, in: Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023. Mihaylova, 2006, The influence of multi-sensor video fusion on object tracking using a particle filter N. Cvejic, S.G. Nikolov, H.D. Knowles, A. Loza, A. Achim, D.R. Bull, C.N. Canagarajah, The Effect of Pixel-Level Fusion on Object Tracking in Multi-Sensor Surveillance Video, in: 2007 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2007, pp. 1–7. T. Dixon, J. Li, J. Noyes, T. Troscianko, S. Nikolov, J. Lewis, E. Canga, D. Bull, C. Canagarajah, Scanpath Analysis of Fused Multi-Sensor Images with Luminance Change: A Pilot Study, in: 2006 9th International Conference on Information Fusion, 2006, pp. 1–8. Zhu, 2021, Quality-aware feature aggregation network for robust RGBT tracking, IEEE Trans. Intell. Veh., 6, 121, 10.1109/TIV.2020.2980735 Y. Gao, C. Li, Y. Zhu, J. Tang, T. He, F. Wang, Deep Adaptive Fusion Network for High Performance RGBT Tracking, in: 2019 IEEE/CVF International Conference on Computer Vision Workshop, ICCVW, 2019, pp. 91–99. Xu, 2021, Multimodal cross-layer bilinear pooling for RGBT tracking, IEEE Trans. Multimed., 24, 567, 10.1109/TMM.2021.3055362 Zhang, 2019, SiamFT: An RGB-infrared fusion tracking method via fully convolutional Siamese networks, IEEE Access, 7, 122122, 10.1109/ACCESS.2019.2936914 Zhang, 2020, DSiamMFT: An RGB-T fusion tracking method via dynamic Siamese networks using multi-layer feature fusion, Signal Process., Image Commun., 84, 10.1016/j.image.2019.115756 Li, 2018, Fusing two-stream convolutional neural networks for RGB-T object tracking, Neurocomputing, 281, 78, 10.1016/j.neucom.2017.11.068 Y. Zhu, C. Li, B. Luo, J. Tang, X. Wang, Dense Feature Aggregation and Pruning for RGBT Tracking, in: Proceedings of the 27th ACM International Conference on Multimedia, 2019, pp. 465–472. Zhu, 2022, RGBT tracking by trident fusion network, IEEE Trans. Circuits Syst. Video Technol., 32, 579, 10.1109/TCSVT.2021.3067997 Li, 2020, Challenge-aware RGBT tracking, 222 H. Nam, B. Han, Learning Multi-domain Convolutional Neural Networks for Visual Tracking, in: 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2016, pp. 4293–4302. Zhang, 2021, Jointly modeling motion and appearance cues for robust RGB-T tracking, IEEE Trans. Image Process., 30, 3335, 10.1109/TIP.2021.3060862 Tang, 2022 Luo, 2019, Thermal infrared and visible sequences fusion tracking based on a hybrid tracking framework with adaptive weighting scheme, Infrared Phys. Technol., 99, 265, 10.1016/j.infrared.2019.04.017 K. Simonyan, A. Zisserman, VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION, in: International Conference on Learning Representations, 2015, pp. 1–14. A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, in: Neural Information Processing Systems, 2012, pp. 1097–1105. Li, 2021, RFN-Nest: An end-to-end residual fusion network for infrared and visible images, Inf. Fusion, 73, 72, 10.1016/j.inffus.2021.02.023 Russakovsky, 2015, ImageNet large scale visual recognition challenge, Int. J. Comput. Vis., 115, 211, 10.1007/s11263-015-0816-y Z. Zhang, H. Peng, Deeper and Wider Siamese Networks for Real-Time Visual Tracking, in: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2019, pp. 4586–4595. Pengyu Zhang, 2021, Learning adaptive attribute-driven representation for real-time RGB-T tracking, Int. J. Comput. Vis., 129, 2714, 10.1007/s11263-021-01495-3 Lu, 2022, Duality-gated mutual condition network for RGBT tracking, IEEE Trans. Neural Netw. Learn. Syst., 1