Spatial attention-guided deformable fusion network for salient object detection

Springer Science and Business Media LLC - Tập 29 - Trang 2563-2573 - 2023
Aiping Yang1,2, Yan Liu, Simeng Cheng1, Jiale Cao1, Zhong Ji1, Yanwei Pang1
1School of Electrical and Information Engineering, Tianjin University, Tianjin, China
2Shanghai Artificial Intelligence Laboratory, Shanghai, China

Tóm tắt

Most of salient object detection methods employ U-shape architecture as the understructure. Although promising performance has been achieved, they struggle to detect salient objects with non-rigid shapes and arbitrary sizes. Besides, the features are transmitted to the decoder directly without any discrimination and active selection, resulting in prominent features underutilized. To address the above issues, we propose a spatial-attention-guided deformable fusion network for salient object detection, which consists of a contour enhancement module (CEM), a spatial-attention-guided deformable fusion module (SADFM) and a gate module (GM). Specifically, the CEM is designed to obtain global features, aiming to reduce the loss of high-level features in the transfer process. The SADFM develops the spatial attention to guide the deformable convolution to aggregate global features, high-level and low-level features adaptively. Furthermore, the GM is employed to refine the initial fusion features and predict the salient regions accurately. Experiments on five public datasets verify the effectiveness of our method.

Tài liệu tham khảo

Wang, H., Li, Z., Li, Y., Gupta, B.B., Choi, C.: Visual saliency guided complex image retrieval. Pattern Recogn. Lett. 130, 64–72 (2020) Zhang, Y., Gao, X., Chen, Z., Zhong, H., Li, L., Yan, C., Shen, T.: Learning salient features to prevent model drift for correlation tracking. Neurocomputing 418, 1–10 (2020) Kampffmeyer, M., Dong, N., Liang, X., Zhang, Y., Xing, E.P.: Connnet: a long-range relation-aware pixel-connectivity network for salient segmentation. IEEE Trans. Image Process. 28(5), 2518–2529 (2018) Chen, Z., Zhou, H., Lai, J., Yang, L., Xie, X.: Contour-aware loss: boundary-aware learning for salient object segmentation. IEEE Trans. Image Process. 30, 431–443 (2020) Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Chen, T., Hu, X., Xiao, J., Zhang, G.: Bpfinet: boundary-aware progressive feature integration network for salient object detection. Neurocomputing 451, 152–166 (2021) Hou, Q., Cheng, M.-M., Hu, X., Borji, A., Tu, Z., Torr, P.H.: Deeply supervised salient object detection with short connections. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3203–3212 (2017) Liu, J.-J., Hou, Q., Cheng, M.-M., Feng, J., Jiang, J.: A simple pooling-based design for real-time salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3917–3926 (2019) Deng, J.: A large-scale hierarchical image database. In: Proceedings IEEE Computer Vision and Pattern Recognition (2009) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) Pang, Y., Zhao, X., Zhang, L., Lu, H.: Multi-scale interactive network for salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9413–9422 (2020) Mohammadi, S., Noori, M., Bahri, A., Majelan, S.G., Havaei, M.: Cagnet: content-aware guidance for salient object detection. Pattern Recogn. 103, 107303 (2020) Zhao, X., Pang, Y., Zhang, L., Lu, H., Zhang, L.: Suppress and balance: a simple gated network for salient object detection. In: European Conference on Computer Vision, pp. 35–51 (2020). Springer Feng, G., Bo, H., Sun, J., Zhang, L., Lu, H.: Cacnet: salient object detection via context aggregation and contrast embedding. Neurocomputing 403, 33–44 (2020) Liu, Y., Duanmu, M., Huo, Z., Qi, H., Chen, Z., Li, L., Zhang, Q.: Exploring multi-scale deformable context and channel-wise attention for salient object detection. Neurocomputing 428, 92–103 (2021) Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017) Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: more deformable, better results. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9308–9316 (2019) Lee, G., Tai, Y.-W., Kim, J.: Deep saliency with encoded low level distance map and high level features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 660–668 (2016) Tang, Y., Wu, X., Bu, W.: Deeply-supervised recurrent convolutional neural network for saliency detection. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 397–401 (2016) Zhao, T., Wu, X.: Pyramid feature attention network for saliency detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3085–3094 (2019) Chen, Z., Xu, Q., Cong, R., Huang, Q.: Global context-aware progressive aggregation network for salient object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 10599–10606 (2020) Luo, Z., Mishra, A., Achkar, A., Eichel, J., Li, S., Jodoin, P.-M.: Non-local deep features for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6609–6617 (2017) Yoon, Y., Jeon, H.-G., Yoo, D., Lee, J.-Y., Kweon, I.S.: Light-field image super-resolution using convolutional neural network. IEEE Signal Process. Lett. 24(6), 848–852 (2017) Shim, G., Park, J., Kweon, I.S.: Robust reference-based super-resolution with similarity-aware deformable convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8425–8434 (2020) Song, H., Xu, W., Liu, D., Liu, B., Liu, Q., Metaxas, D.N.: Multi-stage feature fusion network for video super-resolution. IEEE Trans. Image Process. 30, 2923–2934 (2021) Tian, Y., Zhang, Y., Fu, Y., Xu, C.: Tdan: temporally-deformable alignment network for video super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3360–3369 (2020) Wu, S., Xu, Y.: Dsn: a new deformable subnetwork for object detection. IEEE Trans. Circuits Syst. Video Technol. 30(7), 2057–2066 (2019) Zhang, C., Kim, J.: Object detection with location-aware deformable convolution and backward attention filtering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9452–9461 (2019) Liu, W., Song, Y., Chen, D., He, S., Yu, Y., Yan, T., Hancke, G.P., Lau, R.W.: Deformable object tracking with gated fusion. IEEE Trans. Image Process. 28(8), 3766–3777 (2019) Li, F., Zheng, J., Zhang, Y.-F., Liu, N., Jia, W.: Amdfnet: adaptive multi-level deformable fusion network for rgb-d saliency detection. Neurocomputing 465, 141–156 (2021) Zeng, X., Ouyang, W., Yang, B., Yan, J., Wang, X.: Gated bi-directional cnn for object detection. In: European Conference on Computer Vision, pp. 354–369 (2016). Springer Zhang, L., Dai, J., Lu, H., He, Y., Wang, G.: A bi-directional message passing model for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1741–1750 (2018) Gupta, A.K., Seal, A., Khanna, P., Yazidi, A., Krejcar, O.: Gated contextual features for salient object detection. IEEE Trans. Instrum. Meas. PP(99), 1–1 (2021) Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017) Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018) Peng, C., Zhang, X., Yu, G., Luo, G., Sun, J.: Large kernel matters—improve semantic segmentation by global convolutional network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4353–4361 (2017) Máttyus, G., Luo, W., Urtasun, R.: Deeproadmapper: extracting road topology from aerial images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3438–3446 (2017) De Boer, P.-T., Kroese, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method. Ann. Oper. Res. 134(1), 19–67 (2005) Wang, L., Lu, H., Wang, Y., Feng, M., Wang, D., Yin, B., Ruan, X.: Learning to detect salient objects with image-level supervision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 136–145 (2017) Yang, C., Zhang, L., Lu, H., Ruan, X., Yang, M.-H.: Saliency detection via graph-based manifold ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3166–3173 (2013) Li, Y., Hou, X., Koch, C., Rehg, J.M., Yuille, A.L.: The secrets of salient object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 280–287 (2014) Li, G., Yu, Y.: Visual saliency based on multiscale deep features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5455–5463 (2015) Yan, Q., Xu, L., Shi, J., Jia, J.: Hierarchical saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1155–1162 (2013) Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency-tuned salient region detection. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1597–1604 (2009). IEEE Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Zhang, P., Wang, D., Lu, H., Wang, H., Ruan, X.: Amulet: aggregating multi-level convolutional features for salient object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 202–211 (2017) Liu, N., Han, J., Yang, M.-H.: Picanet: learning pixel-wise contextual attention for saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3089–3098 (2018) Wang, W., Zhao, S., Shen, J., Hoi, S.C., Borji, A.: Salient object detection with pyramid attention and salient edges. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1448–1457 (2019) Li, J., Pan, Z., Liu, Q., Cui, Y., Sun, Y.: Complementarity-aware attention network for salient object detection. IEEE Trans. Cybern. 52(2), 873–886 (2020) Liu, J., Wang, H., Yan, C., Yuan, M., Su, Y.: Soda\(^2\): salient object detection with structure-adaptive & scale-adaptive receptive field. IEEE Access 8, 204160–204172 (2020) Zhou, S., Wang, J., Wang, L., Zhang, J., Wang, F., Huang, D., Zheng, N.: Hierarchical and interactive refinement network for edge-preserving salient object detection. IEEE Trans. Image Process. 30, 1–14 (2020) Wu, Z., Su, L., Huang, Q.: Cascaded partial decoder for fast and accurate salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3907–3916 (2019) Luo, H., Han, G., Wu, X., Liu, P., Yang, H., Zhang, X.: Lf3net: leader-follower feature fusing network for fast saliency detection. Neurocomputing 449, 24–37 (2021) Sun, L., Chen, Z., Wu, Q.J., Zhao, H., He, W., Yan, X.: Ampnet: average-and max-pool networks for salient object detection. IEEE Trans. Circuits Syst. Video Technol. 31(11), 4321–4333 (2021) Li, X., Yang, F., Cheng, H., Liu, W., Shen, D.: Contour knowledge transfer for salient object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 355–370 (2018) Ren, J., Wang, Z., Ren, J.: Ps-net: progressive selection network for salient object detection. Cogn. Comput. 14(2),794–804 (2022) Sun, J., Yan, S., Song, X.: Qcnet: query context network for salient object detection of automatic surface inspection. Vis. Comput. 1–13 (2022). https://doi.org/10.1007/s00371-022-02597-w. Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)