Hierarchical MVSNet with cost volume separation and fusion based on U-shape feature extraction

Springer Science and Business Media LLC - Tập 29 - Trang 377-387 - 2022
Wanjun Liu1, Junkai Wang1, Haicheng Qu1, Lei Shen1
1School of Software, Liaoning Technical University, Huludao, People’s Republic of China

Tóm tắt

Multi-view stereo (MVS) methods based on deep learning have developed rapidly in recent years, but inaccuracies in reconstruction due to the general effect of feature extraction and poor correlation between cost volumes are still present, opening possibilities for improvement in reconstruction accuracy and completeness. We therefore develop a hierarchical MVS network model with cost volume separation and fusion to mitigate these problems. First, to obtain a more complete and accurate feature information from the input images, a U-shape feature extraction module was designed that outputs feature information simultaneously according to a hierarchical structure composed of three different scales. Then, to enhance the learning ability of the network structure for features, we introduced attention mechanisms to the extracted features that focus on and learn the highlighted features. Finally, in the cost volume regularization stage, a cost volume separation and fusion module was designed in the structure of a hierarchical cascade. This module separates the information within the small-scale cost volume, passes it to the lower level cost volume for fusion, and performs a coarse-to-fine depth map estimation. This model results in substantial improvements in reconstruction accuracy and completeness. The results of extensive experiments on the DTU dataset show that our method performs better than Cascade-MVSNet by about 10.2% in accuracy error (acc.), 7.6% in completeness error (comp.), and 9.0% in overall error (overall), with similar performance in the reconstruction completeness, showing the validity of our module.

Tài liệu tham khảo

Schönberger, J.L., Zheng, E., Frahm, J.M., Pollefeys, M.: Pixelwise view selection for unstructured multi-view stereo. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer vision–ECCV 2016. Lecture notes in computer science, vol. 9907. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_31 Schönberger, J. L., Frahm, J.: Structure-from-Motion Revisited. 2016 IEEE conference on computer vis-ion and pattern recognition (CVPR), pp. 4104–4113, (2016). https://doi.org/10.1109/CVPR.2016.445 Campbell, N.D.F., Vogiatzis, G., Hernández, C., Cipolla, R.: Using multiple hypotheses to improve depth-maps for multi-view stereo. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) Computer vision–ECCV 2008 lecture notes in computer science, vol. 5302. Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_58 Furukawa, Y., Ponce, J.: Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32(8), 1362–1376 (2010). https://doi.org/10.1109/TPAMI.2009.161 Galliani, S., Lasinger, K., Schindler, K.: Massi-vely parallel multiview stereopsis by surface norm-al diffusion. 2015 IEEE International Confer-ence on Computer Vision (ICCV), pp. 873–881. (2015). https://doi.org/10.1109/ICCV.2015.106 Tola, E., Strecha, C., Fua, P.: Efficient large-scale multi-view stereo for ultra high-resolution image sets. Mach. Vis. Appl. 23, 903–920 (2012). https://doi.org/10.1007/s00138-011-0346-8 Hartmann, W., Galliani, S., Havlena, M., Van Gool, L., Schindler, K.: Learned Multi-patch Similarity. 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1595–1603, (2017) https://doi.org/10.1109/ICCV.2017.176 Ji, M., Gall, J., Zheng, H., Liu, Y., Fang, L.: SurfaceNet: an end-to-end 3d neural network for multiview stereopsis. 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2326–2334, (2017). https://doi.org/10.1109/ICCV.2017.253 Kar, A., Hane, C., Malik, J.: Learning a multi-view stereo machine. Adv Neural Inf Process Syst (NIPS) (2017). https://arxiv.org/pdf/1708.05375.pdf Huang, P. H., Matzen, K., Kopf, J., Ahuja, N., Hu-ang, J. B.: DeepMVS: learning multi-view stereo-psis. In Proceedings-2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). IEEE Computer Society. (2018). pp. 2821–2830. https://doi.org/10.1109/CVPR.2018.00298 Yao, Y., Luo, Z., Li, S., Fang, T., Quan, L.: MVSNet: depth inference for unstructured multi-view Stereo. ArXiv, abs/1804.02505. (2018). https://doi.org/10.1007/978-3-030-01237-3_47. Yao, Y., Luo, Z., Li, S., Shen, T., Fang, T., Quan, L.: Recurrent MVSNet for high-resolution multi-view stereo depth inference. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5520–5529, (2019). https://doi.org/10.1109/CVPR.2019.00567. Chen, R., Han, S., Xu, J., Su, H.: Point-Based Multi-View Stereo Network. 2019 IEEE/CVF Inter-national Conference on Computer Vision (ICCV), pp. 1538–1547. (2019). https://doi.org/10.1109/ICCV.2019.00162 Luo, K., Guan, T., Ju, L., Huang, H, Luo, Y.: P-MVSNet: learning patch-wise matching confi-dence aggregation for multi-view stereo. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10451–10460, (2019). https://doi.org/10.1109/ICCV.2019.01055 Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F., Tan, P.: Cascade cost volume for high-resolution multi-view stereo and stereo matching. 2020 IEEE/CVF Conference on Computer Vision and Pattern Reco-gnition (CVPR), pp. 2492–2501. (2020). https://doi.org/10.1109/CVPR42600.2020.00257 Yu, Z., Gao, S.: Fast-MVSNet: sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. 2020 IEEE/CVF Confer-ence on Computer Vision and Pattern Recognition (CVPR), pp. 1946–1955. (2020). https://doi.org/10.1109/cvpr42600.2020.00202 Aanæs, H., Jensen, R.R., Vogiatzis, G., et al.: Large-scale data for multiple-view stereopsis. Int. J. Comput. Vis. 120, 153–168 (2016). https://doi.org/10.1007/s11263-016-0902-9 Furukawa, Y., Ponce, J.: Carved visual hulls for image-based modeling. Int. J. Comput. Vision 81(1), 53–67 (2009). https://doi.org/10.1007/s11263-008-0134-8 Li, Z., Wang, K., Zuo, W., Meng, D., Zhang, L.: Detail-preserving and content-aware variational multi-view stereo reconstruction. IEEE Trans Image Process 25(2), 864–877 (2016). https://doi.org/10.1109/TIP.2015.2507400 Seitz, S.M., Dyer, C.R.: Photorealistic scene recons-truction by voxel coloring. Int J Computer Vision (IJCV) (1999). https://doi.org/10.1109/CVPR.1997.609462 Merrell, P.C., Akbarzadeh, A., Wang, L., Mordohai, P., Frahm, J., Yang, R., Nistér, D., Pollefeys, M.: Real-time visibility-based fusion of depth maps. 2007 IEEE 11th International Conference on Computer Vision, 1–8. (2007). https://doi.org/10.1109/ICCV.2007.4408984 Newcombe, R.A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A., Kohli, P., Shotton, J., Hodges, S., Fitzgibbon, A.: KinectFusion: real-time dense surface mapping and tracking. 2011 10th IEEE International Symposium on Mixed and Augmented Reality, 127–136. (2011). https://doi.org/10.1109/ISMAR.2011.6092378 Woo, S., Park, J., Lee, J., Kweon, I.: CBAM: convolutional block attention module. (ECCV) (2018). https://doi.org/10.1007/978-3-030-01234-2_1. Yang, G., Manela, J., Happold, M., Ramanan, D.: Hierarchical deep stereo matching on high-resolution images. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5510–5519, (2019). https://doi.org/10.1109/CVPR.2019.00566 Luo, W., Schwing, A. G., Urtasun, R.: Efficient deep learning for stereo matching. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5695–5703, (2016). https://doi.org/10.1109/CVPR.2016.614 Zbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J Mach Learn Res 17(65), 1–65 (2016) Xue Y, Chen J, Wan W, et al.: MVSCRF: learning multi-view stereo with conditional random fields[C]. International Conference on Computer Vision, 2019: 4311–4320 Yu Z, Gao S.: Fast-MVSNet: sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement[C]. Conference on Computer Vision and Pattern Recognition, 2020: 1946–1955 Yang J, Mao W, Álvarez J, et al.: cost volume pyramid based depth inference for multi-view stereo[C]. Conference on Computer Vision and Pattern Recognition, 2020: 4876–4885 Xu Q, Tao W.: Learning inverse depth regression for multi-view stereo with correlation cost volume[C]. AAAI, 2020: 12508–12515 Yan J, Wei Z, Yi H, et al.: Dense hybrid recurrent multi-view stereo net with dynamic consistency checking[C]. ECCV, 2020: 674–689