Khôi phục cấu trúc hình ảnh dựa trên hình ảnh toàn cảnh thông qua học tập bản đồ độ sâu phẳng

Neural Computing and Applications - Tập 35 - Trang 24407-24433 - 2023
Ming Meng1, Likai Xiao2, Zhong Zhou2,3
1School of Data Science and Media Intelligence, Communication University of China, Beijing, China
2State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing, China
3Zhongguancun Laboratory, Beijing, China

Tóm tắt

Khôi phục cấu trúc cảnh là một quá trình quan trọng nhằm hỗ trợ tái tạo và hiểu biết về cảnh thông qua việc trích xuất thông tin cấu trúc cảnh thiết yếu và đã được sử dụng rộng rãi trong thành phố thông minh, VR/AR và điều hướng robot thông minh. Hình ảnh toàn cảnh với góc nhìn 180° hoặc 360° cung cấp thông tin thị giác phong phú hơn, khiến chúng trở thành một chủ đề nghiên cứu quan trọng trong thị giác máy tính và nhiếp ảnh tính toán. Tuy nhiên, việc khôi phục cấu trúc cảnh bên trong gặp phải những thách thức như việc che khuất nghiêm trọng các khu vực địa phương quan trọng do sự lộn xộn của các vật thể và sự biến dạng phi tuyến lớn. Để giải quyết những hạn chế này, chúng tôi đề xuất một phương pháp khôi phục cấu trúc bên trong dựa trên hình học, tập trung vào học bản đồ độ sâu phẳng, nhằm giảm thiểu sự can thiệp do che khuất ở các khu vực địa phương quan trọng. Phương pháp của chúng tôi liên quan đến việc thiết kế một mạng lưới học bản đồ độ sâu phẳng (OmniPDMNet) cho hình ảnh toàn cảnh, sử dụng việc tăng mẫu và một hàm mất mát dựa trên đặc trưng để ước lượng chính xác bản đồ độ sâu phẳng với độ chính xác cao. Hơn nữa, chúng tôi tận dụng kiến thức trước đó từ bản đồ độ sâu toàn cảnh và đưa nó vào mạng lưới khôi phục cấu trúc (OmniSRNet) để trích xuất các đặc trưng cấu trúc toàn cầu và cải thiện chất lượng tổng thể của việc khôi phục cấu trúc. Chúng tôi cũng giới thiệu một mô-đun nhận diện biến dạng để trích xuất đặc trưng từ hình ảnh toàn cảnh, cho phép khả năng thích ứng với biến dạng hình học toàn cảnh và nâng cao hiệu suất của cả OmniPDMNet và OmniSRNet. Cuối cùng, chúng tôi thực hiện nhiều thí nghiệm sâu rộng trên tập dữ liệu toàn cảnh tập trung vào khôi phục độ sâu phẳng và cấu trúc, cho thấy phương pháp của chúng tôi đạt được hiệu suất hàng đầu trong lĩnh vực.

Từ khóa

#khôi phục cấu trúc #hình ảnh toàn cảnh #học bản đồ độ sâu phẳng #nhận diện biến dạng #thị giác máy tính

Tài liệu tham khảo

Su Y-C, Grauman K (2017) In: 2017 IEEE Conference on Computer Vision And title=Making 360° Video Watchable in 2D: Learning Videography for Click Free Viewing, Pattern Recognition (CVPR), pp 1368–1376 Ramakrishnan SK, Al-Halah Z, Grauman K (2020) Occupancy anticipation for efficient exploration and navigation. In: Vedaldi A, Bischof H, Brox T, Frahm J-M (eds.) Proceedings of the European Conference on Computer Vision (ECCV), pp 400–418 Saito H, Baba S, Kanade T (2003) Appearance-based virtual view generation from multicamera videos captured in the 3d room. IEEE Trans Multimedia 5(3):303–316 Albanis G, Gkitsas V, Zioulis N, Onsori-Wechtitsch S, Whitehand R, Ström P, Zarpalas D (2023) An ai-based system offering automatic dr-enhanced ar for indoor scenes. In: Nakamatsu K, Patnaik S, Kountchev R, Li R, Aharari A (eds.) Advanced Intelligent Virtual Reality Technologies, pp 187–199 Sankar A, Seitz SM (2017) Interactive room capture on 3d-aware mobile devices. In: Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology, pp 415–426 Da Silveira TLT, Jung CR (2022) Visual computing in \(360^{\circ }\): Foundations, challenges, and applications. In: 2022 35th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), vol 1, pp 302–307 Zhang C, Cui Z, Chen C, Liu S, Zeng B, Bao H, Zhang Y (2021) Deeppanocontext: Panoramic 3d scene understanding with holistic scene context graph and relation-based optimization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 12632–12641 Gkioxari G, Ravi N, Johnson J (2022) Learning 3d object shape and layout without 3d supervision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 1695–1704 Jia H, Yi H, Fujiki H, Zhang H, Wang W, Odamaki M (2022) 3d room layout recovery generalizing across manhattan and non-manhattan worlds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5192–5201 Hedau V, Hoiem D, Forsyth D (2009) Recovering the spatial layout of cluttered rooms. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV ’09) Wang H, Hutchcroft W, Li Y, Wan Z, Boyadzhiev I, Tian Y, Kang SB (2022) Psmnet: Position-aware stereo merging network for room layout estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8616–8625 Zhang Y, Song S, Tan P, Xiao J (2014) Panocontext: A whole-room 3d context model for panoramic scene understanding. In: European Conference on Computer Vision, pp 668–686 Yang H, Zhang H (2016) Efficient 3d room shape recovery from a single panorama. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5422–5430 Yang Y, Jin S, Liu R, Kang SB, Yu J (2018) Automatic 3d indoor scene modeling from single panorama. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3926–3934 Fernandez-Labrador C, Perez-Yus A, Lopez-Nicolas G, Guerrero JJ (2018) Layouts from panoramic images with geometry and deep learning. In: IEEE Robotics and Automation Letters, vol 3, pp 3153–3160 Li M, Zhou Y, Meng M, Wang Y, Zhou Z (2019) 3d room reconstruction from a single fisheye image. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp 1–8 Jiang Z, Xiang Z, Xu J, Zhao M (2022) Lgt-net: Indoor panoramic room layout estimation with geometry-aware transformer network. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 1644–1653 Rao S, Kumar V, Kifer D, Giles CL, Mali A (2021) Omnilayout: Room layout reconstruction from indoor spherical panoramas. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 3706–3715 Fernandez-Labrador C, Facil JM, Perez-Yus A, Demonceaux C, Civera J, Guerrero JJ (2020) Corners for layout: End-to-end layout recovery from 360 images. In: IEEE Robotics and Automation Letters, vol 5, pp 1255–1262 Ruder M, Dosovitskiy A, Brox T (2018) Artistic style transfer for videos and spherical images. Int J Comput Vision 126(11):1199–1219 Wang F-E, Yeh Y-H, Sun M, Chiu W-C, Tsai Y-H (2020) Bifuse: Monocular 360 depth estimation via bi-projection fusion. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 459–468 Jiang H, Sheng Z, Zhu S, Dong Z, Huang R (2021) Unifuse: unidirectional fusion for 360 panorama depth estimation. IEEE Robot Autom Lett 5:1–1 Cheng X, Wang P, Zhou Y, Guan C, Yang R (2020) Omnidirectional depth extension networks. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp 589–595 Chen H-X, Li K, Fu Z, Liu M, Chen Z, Guo Y (2021) Distortion-aware monocular depth estimation for omnidirectional images. IEEE Signal Process Lett 5:334–338 Coughlan JM, Yuille AL (2000) The manhattan world assumption: Regularities in scene statistics which enable bayesian inference. In: Advances in Neural Information Processing Systems 13, Papers from Neural Information Processing Systems (NIPS) 2000, Denver, CO, USA Schwing A, Hazan T, Pollefeys M, Urtasun R (2012) Efficient structured prediction for 3d indoor scene understanding. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012, pp 2815–2822 Hedau V, Hoiem D, Forsyth D (2010) Thinking inside the box: Using appearance models and context based on room geometry. In: European Conference on Computer Vision Pero LD, Bowdish J, Kermgard B, Hartley E, Barnard K (2013) Understanding bayesian rooms using composite 3d object models. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 153–160 Xu J, Stenger B, Kerola T, Tung T (2017) Pano2cad: Room layout from a single panorama image. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp 354–362 Yang S-T, Wang F-E, Peng C-H, Wonka P, Sun M, Chu H-K (2019) Dula-net: A dual-projection network for estimating room layouts from a single rgb panorama. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 3358–3367 Zou C, Colburn A, Shan Q, Hoiem D (2018) Layoutnet: reconstructing the 3d room layout from a single rgb image. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2051–2059 Zou C, Su JW, Peng CH, Colburn A, Shan Q, Wonka P, Chu HK, Hoiem D (2021) Manhattan room layout reconstruction from a single \(360^{\circ }\) image: a comparative study of state-of-the-art methods. International Journal of Computer Vision, pp 1–22 Sun C, Hsiao C-W, Sun M, Chen H-T (2019) Horizonnet: Learning room layout with 1d representation and pano stretch data augmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 1047–1056 Pérez-Yus A, López-Nicolás G, Guerrero JJ (2016) Peripheral expansion of depth information via layout estimation with fisheye camera. In: European Conference on Computer Vision, pp 396–412 Zhang W, Zhang W, Zhang Y (2020) Geolayout: Geometry driven room layout estimation based on depth maps of planes. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 632–648 Zeng W, Karaoglu S, Gevers T (2020) Joint 3d layout and depth prediction from a single indoor panorama image. In: 16th European Conference, Glasgow, UK, August 23-28, 2020, pp 666–682 Dong X, Garratt MA, Anavatti SG, Abbass HA (2022) Towards real-time monocular depth estimation for robotics: a survey. IEEE Trans Intell Transp Syst 23(10):16940–16961 Sayed M, Gibson J, Watson J, Prisacariu V, Firman M, Godard C (2022) Simplerecon: 3d reconstruction without 3d convolutions. In: Avidan S, Brostow G, Cissé M, Farinella GM, Hassner T (eds.) Proceedings of the European Conference on Computer Vision (ECCV), pp 1–19. Springer Gao S, Yang K, Shi H, Wang K, Bai J (2022) Review on panoramic imaging and its applications in scene understanding. IEEE Trans Instrum Meas 71:1–34 Hoiem D, Efros AA, Hebert M (2005) Geometric context from a single image. In: Tenth IEEE International Conference on Computer Vision, pp 654–661 Liu B, Gould S, Koller D (2010) Single image depth estimation from predicted semantic labels. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 1253–1260 Han C, Cheng D, Kou Q, Wang X, Chen L, Zhao J (2022) Self-supervised monocular depth estimation with multi-scale structure similarity loss. Multimedia Tools Appl 6:1–16 Xu Q, Kong W, Tao W, Pollefeys M (2022) Multi-scale geometric consistency guided and planar prior assisted multi-view stereo. In: IEEE Transactions on Pattern Analysis and Machine Intelligence Zhou Z, Dong Q (2022) Self-distilled feature aggregation for self-supervised monocular depth estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 709–726 Zhuang C, Lu Z, Wang Y, Xiao J, Wang Y (2022) Acdnet: Adaptively combined dilated convolution for monocular panorama depth estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 36, pp 3653–3661 Tateno K, Navab N, Tombari F (2018) Distortion-aware convolutional filters for dense prediction in panoramic images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 732–750 Zioulis N, Karakottas A, Zarpalas D, Daras P (2018) Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 453–471 Eder M, Moulon P, Guan L (2019) Pano popups: Indoor 3d reconstruction with a plane-aware network. In: 2019 International Conference on 3D Vision (3DV), pp 76–84 Jin L, Xu Y, Zheng J, Zhang J, Tang R, Xu S, Yu J, Gao S (2020) Geometric structure based and regularized depth estimation from 360 indoor imagery. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 886–895 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778 Nie Y, Han X, Guo S, Zheng Y, Chang J, Zhang JJ (2020) Total3dunderstanding: Joint layout, object pose and mesh reconstruction for indoor scenes from a single image. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 52–61 Meng M, Xiao L, Zhou Y, Li Z, Zhou Z (2021) Distortion-aware room layout estimation from a single fisheye image. In: 2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp 441–449 Armeni I, Sax S, Zamir AR, Savarese S (2017) Joint 2D-3D-semantic data for indoor scene understanding Zheng J, Zhang J, Li J, Tang R, Gao S, Zhou Z (2019) Structured3d: A large photo-realistic dataset for structured 3d modeling. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 519–535 Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 764–773 Zhu X, Hu H, Lin S, Dai J (2019) Deformable convnets v2: more deformable, better results. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 9308–9316 Issaranon T, Zou C, Forsyth D (2019) Counterfactual depth from a single rgb image. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp 2129–2138 Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth International Conference on 3D Vision (3DV), pp 239–248 Cui X, Khan D, He Z, Cheng Z (2023) Fusing surveillance videos and three-dimensional scene: a mixed reality system. Comput Anim Virtual Worlds 34(1):1–15 Büschel W, Lehmann A, Dachselt R (2021) Miria: A mixed reality toolkit for the in-situ visualization and analysis of spatio-temporal interaction data. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pp 1–15 Büschel W, Lehmann A, Dachselt R (2021) Miria: A mixed reality toolkit for the in-situ visualization and analysis of spatio-temporal interaction data. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pp 1–15 Philip D, George S, Rony K, Deb R (2010) An immersive system for browsing and visualizing surveillance video. In: ACM International Conference on Multimedia, pp 371–380 Zhou Y, Cao M, You J, Meng M, Wang Y, Zhou Z (2018) MR video fusion: interactive 3D modeling and stitching on wide-baseline videos. In: ACM Symposium on Virtual Reality Software and Technology, p 17 Zhou Z, Meng M, Zhou Y, Zhu Z, You J (2021) Model-guided 3d stitching for augmented virtual environment. Sci China Inf Sci 5:96 Zhu G, Zhang H, Jiang Y, Lei J, He L, Li H (2023) Dynamic fusion technology of mobile video and 3d gis: the example of smartphone video. ISPRS Int J Geo Inf 12(3):125 Azmi A, Ibrahim R, Abdul Ghafar M, Rashidi A (2022) Smarter real estate marketing using virtual reality to influence potential homebuyers’ emotions and purchase intention. Smart Sustain Built Environ 11(4):870–890 Chhikara P, Kuhar H, Goyal A, Sharma C (2023) Digitour: Automatic digital tours for real-estate properties. In: Proceedings of the 6th Joint International Conference on Data Science and Management of Data, pp 223–227 Mendes NP, Santos ET (2022) Exploratory virtual model: study and evaluation of a low-cost vr-based real estate sales tool. J Geom Gr 26(1):171–184