Mạng lưới tổng hợp dựa trên chú ý nửa giám sát với mô-đun tích chập giãn hybrid cho phục hồi video HDR với ít mẫu

Fengshan Zhao1, Qin Liu2, Takeshi Ikenaga1
1Graduate School of IPS, Waseda University, Kitakyushu, Japan
2State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China

Tóm tắt

Các phương pháp dựa trên học sâu cho việc phục hồi video dải động cao (HDR) cần thu thập tập dữ liệu video HDR quy mô lớn với sự thật mặt đất, điều này là rất tốn thời gian. Các chiến lược đào tạo gần đây theo mô hình học ít, nhằm xây dựng một mô hình hiệu quả dựa trên chỉ một vài mẫu được gán nhãn, đã thể hiện sự thành công trong phân loại hình ảnh và phân đoạn hình ảnh. Trong bài báo này, một khuôn khổ học nửa giám sát dựa trên phục hồi video HDR với ít mẫu được đề xuất. Một mạng lưới tổng hợp dựa trên chú ý với mô-đun tích chập giãn hybrid được sử dụng để phục hồi nội dung bị thiếu và loại bỏ các hiện tượng không mong muốn. Mô-đun tích chập giãn hybrid chiết xuất các đặc điểm bổ sung từ các vùng sáng yếu và mô-đun chú ý điều chỉnh chúng để hạn chế thông tin có hại. Trong khuôn khổ nửa giám sát, các hàm mất mát được thiết kế cho nhánh giám sát và nhánh không giám sát được sử dụng để ràng buộc mạng trong quá trình đào tạo dưới kịch bản học ít mẫu. Các kết quả thực nghiệm cho thấy phương pháp được đề xuất đã được đào tạo chỉ với 5 mẫu được gán nhãn và 45 mẫu không được gán nhãn đạt được điểm PSNR là 41.664dB trên tập dữ liệu đánh giá tổng hợp, so với 35.201dB, đây là điểm số tốt nhất trong số các phương pháp giám sát được đào tạo trong cùng một điều kiện học ít mẫu.

Từ khóa

#học sâu #phục hồi video HDR #học nửa giám sát #mạng lưới tổng hợp #mô-đun tích chập giãn hybrid #học ít mẫu

Tài liệu tham khảo

Kang SB, Uyttendaele M, Winder S, Szeliski R (2003) High dynamic range video. ACM Transactions on Graphics (TOG) 22(3):319–325 Kalantari N.K, Ramamoorthi R (2019) Deep hdr video from sequences with alternating exposures. In: Computer graphics forum, vol 38, pp 193–205. Wiley Online Library Chen G, Chen C, Guo S, Liang Z, Wong K-YK, Zhang L (2021) Hdr video reconstruction: A coarse-to-fine network and a real-world benchmark dataset. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2502–2511 Anand M, Harilal N, Kumar C, Raman S (2021) Hdrvideo-gan: deep generative hdr video reconstruction. In: Proceedings of the twelfth indian conference on computer vision, graphics and image processing, pp 1–9 Li L, Dong Y, Ren W, Pan J, Gao C, Sang N, Yang M-H (2019) Semi-supervised image dehazing. IEEE Trans Image Process 29:2766–2779 Hasinoff S.W, Durand F, Freeman WT (2010) Noise-optimal capture for high dynamic range photography. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp 553–560. IEEE Seshadrinathan K, Park SH, Nestares O (2012) Noise and dynamic range optimal computational imaging. In: 2012 19th IEEE international conference on image processing, pp 2785–2788. IEEE Pourreza-Shahri R, Kehtarnavaz N (2015) Exposure bracketing via automatic exposure selection. In: 2015 IEEE international conference on image processing (ICIP):pp 320–323. IEEE Eilertsen G, Kronander J, Denes G, Mantiuk RK, Unger J (2017) Hdr image reconstruction from a single exposure using deep cnns. ACM Transactions on Graphics (TOG) 36(6):1–15 Bogoni L (2000) Extending dynamic range of monochrome and color images through fusion. In: Proceedings 15th international conference on pattern recognition. ICPR-2000, vol 3, pp 7–12. IEEE Jacobs K, Loscos C, Ward G (2008) Automatic high-dynamic range image generation for dynamic scenes. IEEE Comput Graph Appl 28(2):84–93 Kalantari NK, Ramamoorthi R et al (2017) Deep high dynamic range imaging of dynamic scenes. ACM Trans Graph 36(4):144–1 Pece F, Kautz J (2010) Bitmap movement detection: Hdr for dynamic scenes. In: 2010 Conference on visual media production, pp 1–8. IEEE Zhang W, Cham W-K (2012) Reference-guided exposure fusion in dynamic scenes. J Vis Commun Image Represent 23(3):467–475 Oh T-H, Lee J-Y, Tai Y-W, Kweon IS (2014) Robust high dynamic range imaging by rank minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence 37(6):1219–1232 Wu S, Xu J, Tai Y-W, Tang C-K (2018) Deep high dynamic range imaging with large foreground motions. In: Proceedings of the european conference on computer vision (ECCV):pp 117–132 Yan Q, Gong D, Shi Q, Hengel A.v.d, Shen C, Reid I, Zhang Y (2019) Attention-guided network for ghost-free high dynamic range imaging. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1751–1760 Yan Q, Zhang L, Liu Y, Zhu Y, Sun J, Shi Q, Zhang Y (2020) Deep hdr imaging via a non-local network. IEEE Trans Image Process 29:4308–4322 Niu Y, Wu J, Liu W, Guo W, Lau RW (2021) Hdr-gan: Hdr image reconstruction from multi-exposed ldr images with large motions. IEEE Trans Image Process 30:3885–3896 Nayar SK, Mitsunaga T (2000) High dynamic range imaging: Spatially varying pixel exposures. In: Proceedings IEEE conference on computer vision and pattern recognition. CVPR 2000 (Cat. No. PR00662):vol 1, pp 472–479. IEEE Serrano A, Heide F, Gutierrez D, Wetzstein G, Masia B (2016) Convolutional sparse coding for high dynamic range imaging. In: Computer graphics forum, vol 35, pp 153–163. Wiley Online Library Hajisharif S, Kronander J, Unger J (2015) Adaptive dualiso hdr reconstruction. EURASIP Journal on Image and Video Processing 2015(1)1:1–13 Choi I, Baek S-H, Kim MH (2017) Reconstructing interlaced high-dynamic-range video using joint learning. IEEE Trans Image Process 26(11):5353–5366 McGuire M, Matusik W, Pfister H, Chen B, Hughes JF, Nayar SK (2007) Optical splitting trees for high-precision monocular imaging. IEEE Comput Graph Appl 27(2):32–42 Kronander J, Gustavson S, Bonnet G, Ynnerman A, Unger J (2014) A unified framework for multi-sensor hdr video reconstruction. Signal Process Image Commun 29(2):203–215 Mangiat S, Gibson, J (2010) High dynamic range video with ghost removal. In: Applications of digital image processing XXXIII, vol 7798, pp 307–314. SPIE Kalantari NK, Shechtman E, Barnes C, Darabi S, Goldman DB, Sen P (2013) Patch-based high dynamic range video. ACM Trans Graph 32(6):202–1 Li Y, Lee C, Monga V (2016) A maximum a posteriori estimation framework for robust high dynamic range video synthesis. IEEE Trans Image Process 26(3):1143–1157 Cai Q, Pan Y, Yao T, Yan C, Mei T (2018) Memory matching networks for one-shot image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4080–4088 Munkhdalai T, Yuan X, Mehri S, Trischler A (2018) Rapid adaptation with conditionally shifted neurons. In: International conference on machine learning, pp 3664–3673. PMLR Yoon SW, Seo J, Moon J (2019) Tapnet: Neural network augmented with task-adaptive projection for few-shot learning. In: International conference on machine learning, pp 7115–7123. PMLR Li H, Eigen D, Dodge S, Zeiler M, Wang X (2019) Finding task-relevant features for few-shot learning by category traversal. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1–10 Ravi S, Larochelle H (2016) Optimization as a model for few-shot learning. In: International conference on learning representations Su J-C, Maji S, Hariharan B (2020) When does self-supervision improve few-shot learning? In: European conference on computer vision, pp 645–666. Springer Vinyals O, Blundell C, Lillicrap T, Wierstra D, et al (2016) Matching networks for one shot learning. Advances in neural information processing systems 29 Prabhakar KR, Senthil G, Agrawal S, Babu RV, Gorthi RKSS (2021) Labeled from unlabeled: Exploiting unlabeled data for few-shot deep hdr deghosting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4875–4885 Sousa S, Milios E, Berton L (2020) Word sense disambiguation: an evaluation study of semi-supervised approaches with word embeddings. In: 2020 International joint conference on neural networks (IJCNN):pp 1–8. IEEE Hu Z, Yang Z, Hu X, Nevatia R (2021) Simple: Similar pseudo label exploitation for semi-supervised classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15099–15108 Ling S, Liu Y, Salazar J, Kirchhoff K (2020) Deep contextualized acoustic representations for semi-supervised speech recognition. In: ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP):pp 6429–6433. IEEE Lai W-S, Huang J-B, Yang M-H (2017) Semi-supervised learning for optical flow with generative adversarial networks. Advances in Neural Information Processing Systems 30 Yang W, Wang S, Fang Y, Wang Y, Liu J (2020) From fidelity to perceptual quality: A semi-supervised approach for low-light image enhancement. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3063–3072 Ranjan A, Black MJ (2017) Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4161–4170 Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: European conference on computer vision, pp 694–711. Springer Chen D, Yuan L, Liao J, Yu N, Hua G (2017) Stylebank: An explicit representation for neural image style transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1897–1906 Hore A, Ziou D (2010) Image quality metrics: Psnr vs. ssim. In: 2010 20th International conference on pattern recognition, pp 2366–2369. IEEE Froehlich J, Grandinetti S, Eberhardt B, Walter S, Schilling A, Brendel H (2014) Creating cinematic wide gamut hdr-video for the evaluation of tone mapping operators and hdr-displays. In: Digital photography X, vol 9023, pp 279–288. SPIE Mantiuk R, Kim KJ, Rempel AG, Heidrich W (2011) Hdr-vdp-2: A calibrated visual metric for visibility and quality predictions in all luminance conditions. ACM Transactions on graphics (TOG) 30(4):1–14 Narwaria M, Da Silva MP, Le Callet P (2015) Hdr-vqm: An objective quality measure for high dynamic range video. Signal Processing: Image Communication 35:46–60 Jais IKM, Ismail AR, Nisa SQ (2019) Adam optimization algorithm for wide and deep neural network. Knowledge Engineering and Data Science 2(1):41–46