Deformable Kernel Networks for Joint Image Filtering

Springer Science and Business Media LLC - Tập 129 - Trang 579-600 - 2020
Beomjun Kim1, Jean Ponce2, Bumsub Ham1
1School of Electrical and Electronic Engineering, Yonsei University, Seoul, Korea
2Inria and DI-ENS, Département d’Informatique de l’ENS, CNRS, PSL University, Paris, France

Tóm tắt

Joint image filters are used to transfer structural details from a guidance picture used as a prior to a target image, in tasks such as enhancing spatial resolution and suppressing noise. Previous methods based on convolutional neural networks (CNNs) combine nonlinear activations of spatially-invariant kernels to estimate structural details and regress the filtering result. In this paper, we instead learn explicitly sparse and spatially-variant kernels. We propose a CNN architecture and its efficient implementation, called the deformable kernel network (DKN), that outputs sets of neighbors and the corresponding weights adaptively for each pixel. The filtering result is then computed as a weighted average. We also propose a fast version of DKN that runs about seventeen times faster for an image of size $$640 \times 480$$ . We demonstrate the effectiveness and flexibility of our models on the tasks of depth map upsampling, saliency map upsampling, cross-modality image restoration, texture removal, and semantic segmentation. In particular, we show that the weighted averaging process with sparsely sampled $$3 \times 3$$ kernels outperforms the state of the art by a significant margin in all cases.

Tài liệu tham khảo

Bako, S., Vogels, T., McWilliams, B., Meyer, M., Novák, J., Harvill, A., et al. (2017). Kernel-predicting convolutional networks for denoising Monte Carlo renderings. ACM Transactions on Graphics, 36(4), 97. Barron, J. T., & Poole, B. (2016). The fast bilateral solver. In: Proc. Eur. Conf. Comput. Vis. Buades, A., Coll, B., & Morel, J. M. (2005). A non-local algorithm for image denoising. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Butler, D. J., Wulff, J., Stanley, G. B., & Black, M. J. (2012). A naturalistic open source movie for optical flow evaluation. In: Proc. Eur. Conf. Comput. Vis. Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2018). Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848. Choy, C. B., Gwak, J., Savarese, S., & Chandraker, M. (2016). Universal correspondence network. In: Adv. Neural Inf. Process. Syst. Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., & Wei, Y. (2017). Deformable convolutional networks. In: Proc. Int. Conf. Comput. Vis. Diebel, J., & Thrun, S. (2006). An application of Markov random fields to range sensing. In: Adv. Neural Inf. Process. Syst. Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: a retrospective. International Journal of Computer Vision, 111(1), 98–136. Farbman, Z., Fattal, R., Lischinski, D., & Szeliski, R. (2008). Edge-preserving decompositions for multi-scale tone and detail manipulation. ACM Transactions on Graphics, 27(3), 67. Ferstl, D., Reinbacher, C., Ranftl, R., Rüther, M., & Bischof, H. (2013). Image guided depth upsampling using anisotropic total generalized variation. In: Proc. Int. Conf. Comput. Vis. Ferstl, D., Rüther, M., & Bischof, H. (2015). Variational depth superresolution using example-based edge representations. In: Proc. Int. Conf. Comput. Vis. Getreuer, P., Garcia-Dorado, I., Isidoro, J., Choi, S., Ong, F., & Milanfar, P. (2018). Blade: filter learning for general purpose computational photography. In: Proc. IEEE Conf. Computational Photography Gu, S., Zuo, W., Guo, S., Chen, Y., Chen, C., & Zhang, L. (2017). Learning dynamic guidance for depth image enhancement. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Ham, B., Cho, M., & Ponce, J. (2018). Robust guided image filtering using nonconvex potentials. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(1), 192–207. Ham, B., Cho, M., Schmid, C., & Ponce, J. (2016). Proposal flow. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Hariharan, B., Arbelaez, P., Bourdev, L., Maji, S., & Malik, J. (2011). Semantic contours from inverse detectors. In: Proc. Int. Conf. Comput. Vis. He, K., Sun, J., & Tang, X. (2013). Guided image filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(6), 1397–1409. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Hirschmuller, H., & Scharstein, D. (2007). Evaluation of cost functions for stereo matching. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Hosni, A., Rhemann, C., Bleyer, M., Rother, C., & Gelautz, M. (2013). Fast cost-volume filtering for visual correspondence and beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(2), 504–511. Hui, T. W., Loy, C. C., & Tang, X. (2016). Depth map super-resolution by deep multi-scale guidance. In: Proc. Eur. Conf. Comput. Vis. Ioffe, S., & Szegedy, C. (2015). Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proc. Int. Conf. Machine Learning Jaderberg, M., Simonyan, K., & Zisserman, A., et al. (2015). Spatial transformer networks. In: Adv. Neural Inf. Process. Syst. Jia, X., De Brabandere, B., Tuytelaars, T., & Gool, L. V. (2016). Dynamic filter networks. In: Adv. Neural Inf. Process. Syst. Karacan, L., Erdem, E., & Erdem, A. (2013). Structure-preserving image smoothing via region covariances. ACM Transactions on Graphics, 32(6), 176. Kim, J., Kwon Lee, J., & Mu Lee, K. (2016). Accurate image super-resolution using very deep convolutional networks. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Kingma, D. P., & Ba, J. (2015). Adam: a method for stochastic optimization. In: Proc. Int. Conf. Learning Representations Kopf, J., Cohen, M. F., Lischinski, D., & Uyttendaele, M. (2007). Joint bilateral upsampling. ACM Transactions on Graphics, 26(3), 96. Krähenbühl, P., & Koltun, V. (2011). Efficient inference in fully connected CRFS with Gaussian edge potentials. In: Adv. Neural Inf. Process. Syst. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In: Adv. Neural Inf. Process. Syst. Levin, A., Lischinski, D., & Weiss, Y. (2008). A closed-form solution to natural image matting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 228–242. Li, Y., Huang, J. B., Ahuja, N., & Yang, M. H. (2016). Deep joint image filtering. In: Proc. Eur. Conf. Comput. Vis. Li, Y., Huang, J. B., Ahuja, N., & Yang, M. H. (2019). Joint image filtering with deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(8), 1909–1923. Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Lu, S., Ren, X., & Liu, F. (2014). Depth enhancement via low-rank matrix completion. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Margolin, R., Zelnik-Manor, L., & Tal, A. (2014). How to evaluate foreground maps? In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Mildenhall, B., Barron, J. T., Chen, J., Sharlet, D., Ng, R., Carroll, R. (2018). Burst denoising with kernel prediction networks. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Niklaus, S., Mai, L., & Liu, F. (2017). Video frame interpolation via adaptive convolution. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Odena, A., Dumoulin, V., & Olah, C. (2016). Deconvolution and checkerboard artifacts. Distill, 1(10), e3. Park, J., Kim, H., Tai, Y. W., Brown, M. S. & Kweon, I. (2011). High quality depth map upsampling for 3D-ToF cameras. In: Proc. Int. Conf. Comput. Vis. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., & Lerer, A. (2017). Automatic differentiation in PyTorch. In: NIPS-W Revaud, J., Weinzaepfel, P., Harchaoui, Z., & Schmid, C. (2015). EpicFlow: edge-preserving interpolation of correspondences for optical flow. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., (pp. 1164–1172) Riegler, G., Ferstl, D., Rüther, M., & Horst, B. (2016a). A deep primal-dual network for guided depth super-resolution. In: Proc. British Machine Vision Conference Riegler, G., Rüther, M., & Horst, B. (2016b) ATGV-Net: accurate depth super-resolution. In: Proc. Eur. Conf. Comput. Vis. Romano, Y., Isidoro, J., & Milanfar, P. (2017). RAISR: rapid and accurate image super resolution. IEEE Transactions on Computational Imaging, 3(1), 110–125. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: convolutional networks for biomedical image segmentation. In: Proc. Intl. Conf. on Medical image computing and computer-assisted intervention Scharstein, D., & Pal. C. (2007). Learning conditional random fields for stereo. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Shen, X., Zhou, C., Xu, L., & Jia, J. (2015). Mutual-structure for joint filtering. In: Proc. Int. Conf. Comput. Vis. Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A. P., Bishop, R., Rueckert, D., & Wang, Z. (2016). Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Silberman, N., Hoiem, D., Kohli, P., & Fergus, R. (2012). Indoor segmentation and support inference from rgbd images. In: Proc. Eur. Conf. Comput. Vis. Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. In: Adv. Neural Inf. Process. Syst. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In: Proc. Int. Conf. Learning Representations Su, H., Jampani, V., Sun, D., Gallo, O., Learned-Miller, E., & Kautz, J. (2019). Pixel-adaptive convolutional neural networks. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Szeliski, R. (2006). Locally adapted hierarchical basis preconditioning. ACM Transactions on Graphics, 25(3), 1135–1143. Tang, J., Tian, F.P., Feng, W., Li, J., & Tan, P. (2019). Learning guided convolutional network for depth completion. arXiv preprint arXiv:1908.01238 Tomasi, C., & Manduchi, R. (1998). Bilateral filtering for gray and color images. In: Proc. Int. Conf. Comput. Vis. Vogels, T., Rousselle, F., McWilliams, B., Röthlin, G., Harvill, A., Adler, D., et al. (2018). Denoising with kernel prediction and asymmetric loss functions. ACM Transactions on Graphics, 37(4), 124. Wang, J., & Cohen, M. F. (2007). Optimized color sampling for robust matting. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Wu, H., Zheng, S., Zhang, J., & Huang, K. (2018) Fast end-to-end trainable guided filter. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Xu, L., Lu, C., Xu, Y., & Jia, J. (2011). Image smoothing via L0 gradient minimization. ACM Transactions on Graphics, 30(6), 174 Xu, L., Yan, Q., Xia, Y., & Jia, J. (2012). Structure extraction from texture via relative total variation. ACM Transactions on Graphics, 31(6), 139. Xu, L., Ren, J., Yan, Q., Liao, R., & Jia, J. (2015). Deep edge-aware filters. In: Proc. Int. Conf. Machine Learning Yan, Q., Shen, X., Xu, L., Zhuo, S., Zhang, X., Shen, L., & Jia, J. (2013). Cross-field joint image restoration via scale map. In: Proc. Int. Conf. Comput. Vis. Yang, C., Zhang, L., Lu, H., Ruan, X., & Yang, M.H. (2013). Saliency detection via graph-based manifold ranking. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Yang, J., Wright, J., Huang, T. S., & Ma, Y. (2010). Image super-resolution via sparse representation. IEEE Transactions on Image Processing, 19(11), 2861–2873. Yang, Q., Yang, R., Davis, J., & Nistér, D. (2007). Spatial-depth super resolution for range images. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Yu, F., & Koltun, V. (2016). Multi-scale context aggregation by dilated convolutions. In: Proc. Int. Conf. Learning Representations Zhang, K., Zuo, W., Chen, Y., Meng, D., & Zhang, L. (2017). Beyond a Gaussian denoiser: residual learning of deep CNN for image denoising. IEEE Transactions on Image Processing, 26(7), 3142–3155. Zhang, Q., Shen, X., Xu, L., & Jia, J. (2014). Rolling guidance filter. In: Proc. Eur. Conf. Comput. Vis. Zhang, Z. (2012). Microsoft Kinect sensor and its effect. IEEE Transactions on Multimedia, 19(2), 4–10. Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., & Torr, P. H. (2015). Conditional random fields as recurrent neural networks. In: Proc. Int. Conf. Comput. Vis.