Interactive guidance network for object detection based on radar-camera fusion
Multimedia Tools and Applications - Trang 1-19 - 2023
Tóm tắt
In recent years, the performance of image-based object detection algorithms has improved significantly, especially in the field of autonomous driving. It is well known that camera sensors are susceptible to adverse weather conditions, which can significantly affect their performance. In contrast, millimeter wave radar is robust to such weather conditions. As a result, the fusion of millimeter-wave radar and camera sensor has gained considerable attention as a promising approach for object detection. However, existing methods hardly take into account the correlation between the two modalities, leading to detection results that are vulnerable to radar noise, visual blur, and other confounding factors. To address this challenge, we propose an interactive guidance network that leverages a cross-modal attention mechanism, enabling radar and camera sensors to mutually guide each other and learn the underlying correlation between the two modalities. Our approach aims to achieve complementary fusion of features while effectively utilizing information from both radar and camera sensors to enhance detection results. Moreover, a bi-directional fusion Feature Pyramid Network (FPN) structure is introduced, which generates feature maps with enhanced semantic and texture information. To assess the effectiveness of our proposed method, we conducted experiments on the NuScenes dataset. The results demonstrate that our approach outperforms existing state-of-the-art methods in terms of object detection accuracy.
Tài liệu tham khảo
Wei Z, Zhang F, Chang S, Liu Y, Wu H, Feng Z (2022) Mmwave radar and vision fusion for object detection in autonomous driving: a review. Sensors 22(7):2542
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, pp 886–893
Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE conference on computer vision and pattern recognition, pp 1–8
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28
Michaelis C, Mitzkus B, Geirhos R, Rusak E, Bringmann O, Ecker AS, Bethge M, Brendel W (2019) Benchmarking robustness in object detection: Autonomous driving when winter is coming. arXiv preprint arXiv:1907.07484
Wang Z, Wu Y, Niu Q (2019) Multi-sensor fusion in automated driving: A survey. Ieee Access 8:2847–2868
Cho M-g (2019) A study on the obstacle recognition for autonomous driving rc car using lidar and thermal infrared camera. In: 2019 Eleventh international conference on ubiquitous and future networks (ICUFN), pp 544–546
Zhang R, Cao S (2018) Real-time human motion behavior detection via cnn using mmwave radar. IEEE Sensors Letters 3(2):1–4
Yoneda K, Hashimoto N, Yanase R, Aldibaja M, Suganuma N (2018) Vehicle localization using 76ghz omnidirectional millimeter-wave radar for winter automated driving. In: 2018 IEEE intelligent vehicles symposium (IV), pp 971–977
Wang X, Xu L, Sun H, Xin J, Zheng N (2016) On-road vehicle detection and tracking using mmw radar and monovision fusion. IEEE Trans Intell Transp Syst 17(7):2075–2084
Wang X, Xu L, Sun H, Xin J, Zheng N (2014) Bionic vision inspired on-road obstacle detection and tracking using radar and visual information. In: 17th International IEEE conference on intelligent transportation systems (ITSC), pp 39–44
Ćesić J, Marković I, Cvišić I, Petrović I (2016) Radar and stereo vision fusion for multitarget tracking on the special euclidean group. Robot Auton Syst 83:338–348
Zhong Z, Liu S, Mathew M (2018) Dubey A (2018) Camera radar fusion for increased reliability in adas applications. Electronic Imaging 17:258–1
Lekic V, Babic Z (2019) Automotive radar and camera fusion using generative adversarial networks. Comput Vis Image Underst 184:1–8
Obrvan M, Ćesić J, Petrović I (2016) Appearance based vehicle detection by radar-stereo vision integration. In: Robot 2015: second Iberian robotics conference: advances in robotics, vol 1, pp 437–449
Chadwick S, Maddern W, Newman P (2019) Distant vehicle detection using radar and vision. In: 2019 International conference on robotics and automation (ICRA), pp 8311–8317
Chang S, Zhang Y, Zhang F, Zhao X, Huang S, Feng Z, Wei Z (2020) Spatial attention fusion for obstacle detection using mmwave radar and vision sensor. Sensors 20(4):956
Caesar H, Bankiti V, Lang AH, Vora S, Liong VE, Xu Q, Krishnan A, Pan Y, Baldan G, Beijbom O (2020) nuscenes: a multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11621–11631
Nabati R, Qi H (2019) Rrpn: radar region proposal network for object detection in autonomous vehicles. In: 2019 IEEE international conference on image processing (ICIP), pp 3093–3097
Meyer M, Kuschk G (2019) Deep learning based 3d object detection for automotive radar and camera. In: 2019 16th European radar conference (EuRAD), pp 133–136
Dong X, Zhuang B, Mao Y, Liu L (2021) Radar camera fusion via representation learning in autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1672–1681
John V, Mita S (2019) Rvnet: deep sensor fusion of monocular camera and radar for image-based obstacle detection in challenging environments. In: Pacific-rim symposium on image and video technology, pp 351–364
Nobis F, Geisslinger M, Weber M, Betz J, Lienkamp M (2019) A deep learning-based radar and camera sensor fusion architecture for object detection. In: 2019 Sensor data fusion: trends. Solutions, applications (SDF), pp 1–7
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803
Yang Z, Zhu L, Wu Y, Yang Y (2020) Gated channel transformation for visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11794–11803
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W (2019) Ccnet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 603–612
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3146–3154
Xu G, Zhou W, Qian X, Ye L, Lei J, Yu L (2023) Ccfnet: cross-complementary fusion network for rgb-d scene parsing of clothing images. J Vis Commun Image Represent 90:103727
Sun B, Yao Z, Zhang Y, Yu L (2020) Local relation network with multilevel attention for visual question answering. J Vis Commun Image Represent 73:102762
Wang Y, Shen Y, Liu Z, Liang PP, Zadeh A, Morency L-P (2019) Words can shift: Dynamically adjusting word representations using nonverbal behaviors. Proceedings of the AAAI Conference on Artificial Intelligence 33:7216–7223
Wei X, Zhang T, Li Y, Zhang Y, Wu F (2020) Multi-modality cross attention network for image and sentence matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10941–10950
Song X, Guo H, Xu X, Chao H, Xu S, Turkbey B, Wood BJ, Wang G, Yan P (2021) Cross-modal attention for mri and ultrasound volume registration. In: International conference on medical image computing and computer-assisted intervention, pp 66–75
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Nabati R, Qi H (2020) Radar-camera sensor fusion for joint object detection and distance estimation in autonomous vehicles. arXiv preprint arXiv:2009.08428
Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790
Li C, Li L, Jiang H, Weng K, Geng Y, Li L, Ke Z, Li Q, Cheng M, Nie W, Li Y, Zhang B, Liang Y, Zhou L, Xu X, Chu X, Wei X, Wei X (2022) Yolov6: a single-stage object detection framework for industrial applications. arXiv:2209.02976