LatticeNet: fast spatio-temporal point cloud segmentation using permutohedral lattices
Tóm tắt
Deep convolutional neural networks have shown outstanding performance in the task of semantically segmenting images. Applying the same methods on 3D data still poses challenges due to the heavy memory requirements and the lack of structured data. Here, we propose LatticeNet, a novel approach for 3D semantic segmentation, which takes raw point clouds as input. A PointNet describes the local geometry which we embed into a sparse permutohedral lattice. The lattice allows for fast convolutions while keeping a low memory footprint. Further, we introduce DeformSlice, a novel learned data-dependent interpolation for projecting lattice features back onto the point cloud. We present results of 3D segmentation on multiple datasets where our method achieves state-of-the-art performance. We also extend and evaluate our network for instance and dynamic object segmentation.
Tài liệu tham khảo
A large scale spatio-temporal dataset of point clouds of maize and tomato plants. https://www.ipb.uni-bonn.de/data/pheno4d/. Accessed: 2021-01-1.
Baek, J., & Adams, A. (2009). Some useful properties of the permutohedral lattice for Gaussian filtering. Other Words 10(1).
Barron, J.T., Adams, A., YiChang, S., & Hernández, C. (2015). Fast bilateral-space stereo for synthetic defocus—Supplemental material. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–15.
Behley, J., Garbade, M., Milioto, A., Quenzel, J., Behnke, S., Stachniss, C., & Gall, J. (2019) SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
Berman, M., Triki, A.R., & Blaschko, M.B. (2018). The Lovász-softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4413–4421.
Chen, L-C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587.
Chen, W., Han, X., Li, G., Chen, C., Xing, J., Zhao, Y., & Li, H. (2018). Deep RBFNet: Point cloud feature learning using radial basis functions. arXiv preprint arXiv:1812.04302.
Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran, J., Catanzaro, B., & Shelhamer, E. (2014). cuDNN: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759.
Choy, C., Gwak, J., & Savarese, S. (2019). 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. arXiv preprint arXiv:1904.08755.
Dai, A., & Nießner, M. (2018). 3DMV: Joint 3D-multi-view prediction for 3D semantic scene segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 452–468.
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., & Nießner, M. (2017). ScanNet: Richly-annotated 3D reconstructions of indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5828–5839.
De Brabandere, B., Neven, D., & Van Gool, L. (2017). Semantic instance segmentation with a discriminative loss function. arXiv preprint arXiv:1708.02551.
Defferrard, M., Bresson, X., & Vandergheynst, P. (2016). Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 3844–3852.
Graham, B., Engelcke, M., & van der Maaten, L. (2018). 3D semantic segmentation with submanifold sparse convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9224–9232.
Gu, X., Wang, Y., Wu, C., Lee, Y.J., & Wang, P. (2019). HPLFlowNet: Hierarchical Permutohedral Lattice FlowNet for Scene Flow Estimation on Large-scale Point Clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3254–3263.
He, K., Zhang, X., Ren, S., & Sun, J. (2016a) Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778.
He, K., Zhang, X., Ren, S., & Sun, J. (2016b) Identity mappings in deep residual networks. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 630–645.
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K.Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4700–4708.
Huang, J., Zhang, H., Yi, L., Funkhouser, T., Nießner, M., & Guibas, L.J. (2019). TextureNet: Consistent local parametrizations for learning from high-resolution signals on meshes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4440–4449.
Li, Y., Bu, R., Sun, M., Wu, W., Di, X., & Chen, B. (2018). PointCNN: Convolution on x-transformed points. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 820–830.
Lin, G., Milan, A., Shen, C., & Reid, I. (2017). RefineNet: Multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1925–1934.
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440.
Masci, J., Boscaini, D., Bronstein, M., & Vandergheynst, P. (2015). Geodesic convolutional neural networks on Riemannian manifolds. In Workshop Proceedings of the IEEE International Conference on Computer Vision (ICCV Workshops), pp. 37–45.
Monti, F., Boscaini, D., Masci, J., Rodola, E., Svoboda, J., & Bronstein, M.M. (2017). Geometric deep learning on graphs and manifolds using mixture model CNNs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5115–5124.
Neven, D., De Brabandere, B., Proesmans, M., & Van Gool, L. (2019). Instance segmentation by jointly optimizing spatial embeddings and clustering bandwidth. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8837–8845.
Nießner, M., Zollhöfer, M., Izadi, S., & Stamminger, M. (2013). Real-time 3D reconstruction at scale using voxel hashing. ACM Transactions on Graphics (ToG), 32(6), 1–11.
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., & Lerer, A. (2017). Automatic Differentiation in PyTorch. In NIPS Autodiff Workshop.
Pham, Q-H., Hua, B-S., Nguyen, T., & Yeung, S-K. (2019a). Real-time progressive 3D semantic segmentation for indoor scenes. In Proceedings of the IEEE Workshop on Applications of Computer Vision, pp. 1089–1098.
Pham, Q-H., Nguyen, T., Hua, B-S., Roig, G., & Yeung, S-K. (2019b). JSIS3D: Joint semantic-instance segmentation of 3D point clouds with multi-task pointwise networks and multi-value conditional random fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8827–8836.
Qi, C.R., Su, H., Mo, K., & Guibas, L.J. (2017a). PointNet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 652–660.
Qi, C.R., Yi, L., Su, H., & Guibas, L.J. (2017b). PointNet++: Deep hierarchical feature learning on point sets in a metric space. In Proc. of the Advances in Neural Information Processing Systems (NIPS), pp. 5099–5108.
Qi, C.R., Litany, O., He, K., & Guibas, L.J. (2019) Deep Hough voting for 3D object detection in point clouds. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 9277–9286.
Ravanbakhsh, S., Schneider, J.G., & Póczos, B. (2016). Deep Learning with Sets and Point Clouds. arXiv preprint arXiv:1611.04500.
Rethage, D., Wald, J., Sturm, J., Navab, N., & Tombari, F. (2018). Fully-convolutional point networks for large-scale point clouds. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 596–611.
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241.
Rosu, R.A., Schütt, P., Quenzel, J., & Behnke, S. (2020). LatticeNet: Fast point cloud segmentation using permutohedral lattices. Proceedings of Robotics: Science and Systems.
Shi, H., Lin, G., Wang, H., Hung, T-Y., & Wang, Z. (2020). SpSequenceNet: Semantic Segmentation Network on 4D Point Clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4574–4583.
Stotko, P., Krumpen, S., Weinmann, M., & Klein, R. (2019). Efficient 3D Reconstruction and Streaming for Group-Scale Multi-Client Live Telepresence. In Proceedings of the IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 19–25.
Su, H., Jampani, V., Sun, D., Maji, S., Kalogerakis, E., Yang, M-H., & Kautz, J. (2018). SplatNet: Sparse lattice networks for point cloud processing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2530–2539.
Tanke, J., Kwon, O-H., Stotko, P., Rosu, R.A., Weinmann, M., Errami, H., Behnke, S., Bennewitz, M., Klein, R., Weber, A., et al. (2019). Bonn Activity Maps: Dataset Description. arXiv preprint arXiv:1912.06354.
Tatarchenko, M., Park, J., Koltun, V., & Zhou, Q-Y. (2018). Tangent convolutions for dense prediction in 3D. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3887–3896.
Tchapmi, L., Choy, C., Armeni, I., Gwak, J., & Savarese, S. (2017). SEGCloud: Semantic segmentation of 3D point clouds. In International Conference on 3D Vision (3DV), pp. 537–547. IEEE.
Thomas, H., Qi, C.R., Deschaud, J-E., Marcotegui, B., Goulette, F., & Guibas, L.J. (2019). KPConv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE Int. Conference on Computer Vision (ICCV), pp. 6411–6420.
Wang, S., Suo, S., Ma, W.C., Pokrovsky, A., & Urtasun, R. (2018a). Deep parametric continuous convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2589–2597.
Wang, W., Yu, R., Huang, Q., & Neumann, U. (2018b). SGPN: Similarity group proposal network for 3D point cloud instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2569–2578.
Wang, X., Liu, S., Shen, X., Shen, C., & Jia, J. (2019). Associatively segmenting instances and semantics in point clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4096–4105.
Wu, B., Zhou, X., Zhao, S., Yue, X., & Keutzer, K. (2018). SqueezeSegv2: Improved model structure and unsupervised domain adaptation for road-object segmentation from a lidar point cloud. arXiv preprint arXiv:1809.08495.
Wu, W., Qi, Z., & Fuxin, L. (2019). PointConv: Deep convolutional networks on 3D point clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9621–9630.
Wu, Y., & He, K. (2018). Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19.
Yang, B., Wang, J., Clark, R., Hu, Q., Wang, S., Markham, A., & Trigoni, N. (2019). Learning object bounding boxes for 3D instance segmentation on point clouds. arXiv preprint arXiv:1906.01140.
Yi, L., Kim, L. G., Ceylan, D., Shen, I., Yan, M., Su, H., et al. (2016). A scalable active framework for region annotation in 3D shape collections. ACM Transactions on Graphics (ToG), 35(6), 210.
Yi, L., Zhao, W., Wang, H., Sung, M., & Guibas, L.J. (2019). GSPN: Generative shape proposal network for 3D instance segmentation in point cloud. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3947–3956.
Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R.R., & Smola, A.J. (2017). Deep sets. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 3391–3401.