Neural Networks for Classification and Unsupervised Segmentation of Visibility Artifacts on Monocular Camera Image
Tóm tắt
For computer vision systems of autonomous vehicles, an important task is to ensure high reliability of visual information coming from on-board cameras. Frequent problems are contamination of the camera lens, its defocusing due to mechanical damage, image motion blur in low light conditions. In our work, we propose a novel neural network approach to the classification and unsupervised segmentation of visibility artifacts on monocular camera images. It is based on the compact classification deep neural network with an integrated modification of the gradient method for class activation map and segmentation mask generating. We present a new dataset named Visibility Artifacts containing over 22 300 images including six common artifacts: complete loss of camera visibility, strong or partial contamination, rain or snow drops, motion blur, defocus. To check the quality of artifact localization, a small test set with ground truth masks is additionally labeled. It allowed us to objectively quantitatively compare various methods for constructing class activation maps (CAMERAS, FullGrad, original and modified Grad-CAM, Layer-CAM), which demonstrated image segmentation quality above 54% mIoU without any supervision. This is a promising result. Experiments with the developed dataset demonstrated the superiority of the neural network classification method ResNet-18_U (with test accuracy of 99.37%), compared to more complex convolutional (ResNet-34, ResNeXt-50, EfficientNet-B0) and transformer (ViT-Ti, DeiT-Ti) neural networks. The code of the proposed method and the dataset are publicly available at
https://github.com/vd-kuznetsov/CaUS_Visibility_Artifacts
.
Tài liệu tham khảo
Soboleva, V. and Shipitko, O., Raindrops on windshield: Dataset and lightweight gradient-based detection algorithm, in 2021 IEEE Symposium Series on Computational Intelligence (SSCI), 2021, pp. 1–7.
Ivanov, A. and Yudin, D., Visibility loss detection for video camera using deep convolutional neural networks, in International Conference on Intelligent Information Technologies for Industry, 2018, pp. 434–443.
Xia, J., Xuan, D., Tan, L., and Xing, L., ResNet15: Weather recognition on traffic road with deep convolutional Neural Network, Adv. Meteorol., 2020.
Yu, T., Kuang, Q., Hu, J., Zheng, J., and Li, X., Global-similarity local-salience network for traffic weather recognition, IEEE Access, 2020, vol. 9, pp. 4607–4615.
Wang, D., Zhang, T., Zhu, R., Li, M., and Sun, J., Extreme image classification algorithm based on multicore dense connection Network, Math. Probl. Eng., 2021.
Dhananjaya, M.M., Kumar, V.R., and Yogamani, S., Weather and light level classification for autonomous driving: Dataset, baseline and active learning, in 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), 2021, pp. 2816–2821.
Ronneberger, O., Fischer, P., and Brox, T., U-net: Convolutional networks for biomedical image segmentation, in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2015, pp. 234–241.
Chen, L.C. et al., Encoder-decoder with atrous separable convolution for semantic image segmentation, in Proceedings of the European Conference on Computer Vision, 2018, pp. 801–818.
Wang, J. et al., Deep high-resolution representation learning for visual recognition, IEEE Trans. Pattern Anal. Mach. Intell., 2020, vol. 43, no. 10, pp. 3349–3364.
Xie, E. et al., SegFormer: Simple and efficient design for semantic segmentation with transformers, Adv. Neural Inform. Process. Syst., 2021, vol. 34.
Shepel, I., Adeshkin, V., Belkin, I., and Yudin, D.A., Occupancy grid generation with dynamic obstacle segmentation in stereo images, IEEE Trans. Intell. Transp. Syst., 2021.
Roser, M. and Geiger, A., Video-based raindrop detection for improved image registration, in 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, 2009, pp. 570–577.
You, S. et al., Adherent raindrop detection and removal in video, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 1035–1042.
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., and Batra, D., Grad-cam: Visual explanations from deep networks via gradient-based localization, in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 618–626.
Chattopadhay, A., Sarkar, A., Howlader, P., and Balasubramanian, V.N., Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks, in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 2018, pp. 839–847.
Fu, R., Hu, Q., Dong, X., Guo, Y., Gao, Y., and Li, B., Axiom-based grad-cam: Towards accurate visualization and explanation of cnns, arXiv:2008.02312, 2020.
Omeiza, D. et al., Smooth grad-cam++: An enhanced inference level visualization technique for deep convolutional neural network models, arXiv:1908.01224, 2019.
Jiang, P.T. et al., Layercam: Exploring hierarchical class activation maps for localization, IEEE Trans. Image Process., 2021, vol. 30, pp. 5875–5888.
Srinivas, S. and Fleuret, F., Full-gradient representation for neural network visualization, Adv. Neural Inform. Process. Syst., 2019, vol. 32.
Jalwana, M.A., Akhtar, N., Bennamoun, M., and Mian, A., CAMERAS: Enhanced resolution and sanity preserving class activation mapping for image saliency, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 16327–16336.
He, K., Zhang, X., Ren, S., and Sun, J., Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
Ramanna, S., Sengoz, C., Kehler, S., and Pham, D., Near real-time map building with multi-class image set labeling and classification of road conditions using Convolutional Neural Networks, Appl. Artif. Intell., 2021, vol. 35, no. 11, pp. 803–833.
Dahmane, K. et al., WeatherEye-proposal of an algorithm able to classify weather conditions from traffic camera images, Atmosphere, 2021, vol. 12, no. 6, p. 717.
Sun, Z. et al., A practical weather detection method built in the surveillance system currently used to monitor the large-scale freeway in China, IEEE Access, 2020, vol. 8, pp. 112357–112367.
Zoph, B. et al., Learning transferable architectures for scalable image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8697–8710.
Zhou, B. et al., Learning deep features for discriminative localization, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2921–2929.
Ramaswamy, H.G., Ablation-cam: Visual explanations for deep convolutional network via gradient-free localization, in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020, pp. 983–991.
Wang, H. et al., Score-CAM: Score-weighted visual explanations for convolutional neural networks, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 24–25.
Muhammad, M.B. et al., Eigen-cam: Class activation map using principal components, in 2020 International Joint Conference on Neural Networks (IJCNN), IEEE, 2020, pp. 1–7.
Naidu, R., Ghosh, A., Maurya, Y., and Kundu, S.S., IS-CAM: Integrated Score-CAM for axiomatic-based explanations, arXiv:2010.03023, 2020.
Yang, S. et al., Combinational class activation maps for weakly supervised object localization, in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020, pp. 2941–2949.
Wang, H., Naidu, R., Michael, J., and Kundu, S.S., SS-CAM: Smoothed Score-CAM for sharper visual feature localization, arXiv:2006.14255, 2020.
Xie, S. et al., Aggregated residual transformations for deep neural networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1492–1500.
Tan, M. and Le, Q., Efficientnet: Rethinking model scaling for convolutional neural networks, in International Conference on Machine Learning, 2019, pp. 6105–6114.
Steiner, A. et al., How to train your vit?, Data, augmentation, and regularization in vision transformers, arXiv: 2106.10270, 2021.
Touvron, H. et al., Training data-efficient image transformers and distillation through attention, in International Conference on Machine Learning, 2021, pp. 10347–10357.
ColorMaps in OpenCV. https://docs.opencv.org/4.x/d3/d50/group__imgproc__colormap.html.
Yogamani, S. et al., Woodscape: A multi-task, multi-camera fisheye dataset for autonomous driving, in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9308–9318.
Yang, G., Song, X., Huang, C., Deng, Z., Shi, J., and Zhou, B., Drivingstereo: A large-scale dataset for stereo matching in autonomous driving scenarios, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 899–908.
Sakaridis, C., Dai, D., and Van Gool, L., ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding, in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10765–10775.
Jung, A.B. et al., Imgaug, 2020. https://github.com/aleju/imgaug.