Spatio-Temporal Learning for Video Deblurring based on Two-Stream Generative Adversarial Network

Springer Science and Business Media LLC - Tập 53 - Trang 2701-2714 - 2021
Liyao Song1, Quan Wang2, Haiwei Li2, Jiancun Fan3, Bingliang Hu2
1School of Information and Communications Engineering, Xi’an Jiaotong University, Xi’an, China
2Key Laboratory of Spectral Imaging Technology, Xi’an Institute of Optics and Precision Mechanics of Chinese Academy of Sciences, Xi’an, China
3The School of Information and Communications Engineering, Xi’an Jiaotong University, Xi’an, China

Tóm tắt

Video-deblurring has achieved excellent results by using deep learning approaches. How to capture the dynamic spatio-temporal information in the videos is crucial on deblurring. In this paper, we propose a two-stream DeblurGAN which combines a 3D stream with a 2D stream to deblur. The 3D convolution provides spatial and temporal invariance to restore the foreground of frames, while the 2D convolution is sufficient to deal with spatial features, given a relatively consistant background. Thus, our model takes advantage of the great processing power of the 3D stream to handle the foreground which usually contains more dynamical motion blur, and the advantage of the simplicity of the 2D stream to handle the mostly consistent background. We have the full advantage of combining both the 3D convolution and the 2D convolution. Then we take the two-stream model as the generator and adopt the adversarial learning. We test our model on the VideoDeblurring and GOPRO datasets, and compare with other methods which we have listed. Our method outperforms others in the Peak Signal-to-Noise Ratio (PSNR), especially shows a good performance handling the foreground with obvious motion blur.

Tài liệu tham khảo

Bahat Y, Efrat N, Irani M (2017) Non-uniform blind deblurring by reblurring. IEEE Comput Vis Pattern Recognit (CVPR) 2017:3306–3314 Cho S, Wang J (2012) Lee S (2012) Video deblurring for hand-held cameras using patch-based synthesis. ACM Trans on Graph (TOG) 31(4):64 Dong J, Pan J, Su Z et al (2017) Blind image deblurring with outlier handling. IEEE Comput Vis Pattern Recognit (CVPR) 2017:2497–2505 Fu S, Liu W, Tao D et al (2019) HesGCN: hessian graph convolutional networks for semi-supervised classification. Inform Sci 2019:514 Fu S, Liu W, Zhang K, et al (2021) Semi-supervised classification by graph p-laplacian convolutional networks. Inform Sci, 2021 Gong D, Tan M, Zhang Y et al (2017) Self-paced kernel estimation for robust blind image deblurring. IEEE Comput Vis Pattern Recognit (CVPR) 2017:1670–1679 Gong D, Yang J, Liu L, et al (2017) From motion blur to motion flow: a deep learning solution for removing heterogeneous motion blur. In: IEEE Comput Vis Pattern Recognit (CVPR), 2017 Goodfellow IJ, Pouget-Abadie J, Mirza M et al (2014) Generative adversarial networks. Advances Neural Infor Proce Sys (NeurIPS) 2014:2672–2680 Gupta A, Joshi N, Zitnick CL et al (2010) Single image deblurring using motion density functions. Euro Conf Comput Vis (ECCV) 2010:171–184 He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: IEEE Comput Vis Pattern Recognit (CVPR), 2016 Hirsch M, Schuler CJ, Harmeling S et al (2011) Fast removal of non-uniform camera shake. IEEE Comput Vis Pattern Recognit (CVPR) 2011:463–470 Hirsch M, Schuler C J, Harmeling S, et al (2011) Fast removal of non-uniform camera shake. In: IEEE Inter Conf Comput Vis (ICCV), 2011 Hongguang Z, Yuchao D, Hongdong L, Piotr K (2019) Deep stacked hierarchical multi-patch network for image deblurring. In: IEEE Comput Vis Pattern Recognit (CVPR), June 2019 Ji S, Xu W, Yang M (2013) Yu K (2013) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Analys Machine Intell 35(1):221–231 Jin H, Favaro P, Cipolla R (2005) Visual tracking in the presence of motion blur. Proce IEEE Comput Soci Conf Comput Vis Pattern Recognit 2005:18–25 Jin M, Roth S, Favaro P (2017) Noise-blind image deblurring. In: IEEE Comput Vis Pattern Recognit (CVPR), 2017 Krishnan D, Tay T, Fergus R (2011) Blind deconvolution using a normalized sparsity measure. IEEE Comput Vis Pattern Recognit (CVPR) 2011:233–240 Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Inter Conf Neural Infor Proce Sys 2012:1097–1105 Kupyn O, Martyniuk T, Wu J et al (2019) DeblurGAN-v2: deblurring (orders-of-magnitude) faster and better. IEEE Comput Vis Pattern Recognit (CVPR) 2019:8878–8887 Ledig C, Theis L, Huszar F, et al (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: IEEE Comput Vis Pattern Recognit (CVPR), 2017 Lee H S, Lee K M (2013) Dense 3d reconstruction from severely blurred images using a single moving camera. In: IEEE Comput Vis Pattern Recognit (CVPR), 2013 Levin A, Weiss Y, Durand F et al (2009) Understanding and evaluating blind deconvolution algorithms. IEEE Comput Vis Pattern Recognit (CVPR) 2009:1964–1971 Liu W, Fu S, Zhou Y et al (2020) (2020) Human activity recognition by manifold regularization based dynamic graph convolutional networks. Neurocomputing Mahesh MM, Rajagopalan AN (2017) Going unconstrained with rolling shutter deblurring. IEEE Comput Vis Pattern Recognit (CVPR) 2017:4030–4038 Marquina A, Osher S (2000) Explicit algorithms for a new time dependent model based on level set motion for nonlinear deblurring and noise removal. SIAM J Scient Comput 2000:387–405 Orest K, Volodymyr B, Mykola M et al (2018) (2018) Deblurgan: blind motion deblurring using conditional adversarial networks. IEEE Comput Vis Pattern Recognit (CVPR) 2018:8183–8192 Pan J, Hu Z, Su Z et al (2014) Deblurring text images via L0-regularized intensity and gradient prior. Comput Vision Pattern Recognit 2014:2901–2908 Pan J, Dong J, Tai YW et al (2017) Learning discriminative data fitting functions for blind image deblurring. IEEE Comput Vis Pattern Recognit (CVPR) 2017:1077–1085 Park SH, Levoy M (2014) Gyro-based multi-image deconvolution for removing handshake blur. IEEE Comput Vis Pattern Recognit (CVPR) 2014:3366–3373 Ren W, Pan J, Cao X et al (2017) Video-deblurring via semantic segmentation and pixel-wise non-linear kernel. IEEE Comput Vis Pattern Recognit (CVPR) 2017:1086–1094 Ren S, He K, Girshick R, et al (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Adv Neural Infor Proce Sys (NeurIPS), 2015 Sellent A, Rother C, Roth S (2016) Stereo video deblurring. In: Euro Conf Comput Vis (ECCV), 2016 Seungjun N, Sanghyun S, Kyoung ML (2019) Recurrent neural networks with intra-frame iterations for video deblurring. In: IEEE Comput Vis Pattern Recognit (CVPR), June 2019 Shi H, Zhang Y, Zhang Z, et al (2018) Hypergraph-induced convolutional networks for visual classification. IEEE Trans Neur Networks Learn Sys, 2018, pp:1-10 Simonyan K, Zisserman A (2014) Two-Stream Convolutional Networks for Action Recognition in Videos. In: Advances Neural Infor Proce Sys (NeurIPS), 2014 Srinivasan P P, Ng R, Ramamoorthi R (2017) Light field blind motion deblurring. In: IEEE Comput Vis Pattern Recognit (CVPR), 2017 Su S, Delbracio M, Wang J et al (2017) Deep video-deblurring. IEEE Comput Vis Pattern Recognit (CVPR) 2017:1279–1288 Sun L, Cho S, Wang J et al (2013) Edge-based blur kernel estimation using patch priors. IEEE Inter Conf Comput Photography 2013:1–8 Sun J, Cao W, Xu Z et al (2015) Learning a convolutional neural network for non-uniform motion blur removal. IEEE Comput Vis Pattern Recognit (CVPR) 2015:769–777 Tai YW, Du H, Brown MS, et al (2008) Image/video-deblurring using a hybrid camera. In: IEEE Comput Vis Pattern Recognit (CVPR), 2008 Vondrick C, Pirsiavash H, Torralba A (2016) Generating Videos with Scene Dynamics. In: Advances Neural Infor Proce Sys (NeurIPS), 2016 Wang T, Zhang X, Jiang R (2020) Video deblurring via spatiotemporal pyramid network and adversarial gradient prior. Comput Vis Image Understand 203:103135 Wu L, Wang Y, Shao L (2018) Cycle-consistent deep generative hashing for cross-modal retrieval. IEEE Trans Image Process, 2018, pp (99):1-1 Xu L, Zheng S, Jia J (2013) Unnatural L0 sparse representation for natural image deblurring. IEEE Comput Vis Pattern Recognit (CVPR) 2013:1107–1114 Xu L, Ren J, Liu C, Jia J (2014) Deep convolutional neural network for image deconvolution. In: Advances Neural Infor Proce Sys (NeurIPS), pp 1790–1798 Yan Y, Ren W, Guo Y et al (2017) Image deblurring via extreme channels prior. IEEE Comput Vis Pattern Recognit (CVPR) 2017:6978–6986 Yang F, Xiao L, Yang J (2020) Video deblurring Via 3d CNN and fourier accumulation learning. IEEE Inter Conf Acoust Speech Sigl Process (ICASSP). 2020:2443–2447 Yu Z, Xu D, Yu J (2019) Activitynet-QA: a dataset for understanding complex web videos via question answering. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 2019(33):9127–9134 Yu J, Li J, Yu Z (2019) Multimodal transformer with multi-view visual representation for image captioning. IEEE Trans Circuits and Sys for Video Tech, 2019 Yu T, Yu J, Yu Z (2020) Long-term video question answering via multimodal hierarchical memory attentive networks. IEEE Trans Circuits and Sys for Video Tech, 2020, PP(99):1-1 Zhang K, Luo W, Zhong Y et al (2018) (2018) Adversarial spatio-temporal learning for video-deblurring. IEEE Trans Image Process 28(1):291–301 Zhang K, Luo W, Zhong Y (2020) Deblurring by realistic blurring. In: IEEE Comput Vis Pattern Recognit (CVPR), 2020 Zhou Y, Sun X, Zha ZJ et al (2018) Mict: Mixed 3d/2d convolutional tube for human action recognition. IEEE Comput Vis Pattern Recognit (CVPR) 2018:449–458