Stereo Frustums: a Siamese Pipeline for 3D Object Detection

Journal of Intelligent and Robotic Systems - Tập 101 - Trang 1-15 - 2020
Xi Mo1, Usman Sajid1, Guanghui Wang2
1Department of Electrical Engineering and Computer Science, School of Engineering, University of Kansas, Lawrence, USA
2Department of Computer Science, Ryerson University, Toronto, Canada

Tóm tắt

Bài báo đề xuất một mô-đun ghép cặp hình chóp stereo nhẹ cho phát hiện đối tượng 3D. Khung công tác được đề xuất tận dụng một trình phát hiện 2D có hiệu năng cao và một mạng phân khúc điểm mây để hồi quy các hộp giới hạn 3D cho các phương tiện lái tự động. Thay vì thực hiện ghép cặp stereo truyền thống để tính toán độ chênh lệch, mô-đun này trực tiếp nhận các đề xuất 2D từ cả hai góc nhìn trái và phải làm đầu vào. Dựa trên các ràng buộc epipolar được phục hồi từ các camera stereo đã được hiệu chuẩn tốt, chúng tôi đề xuất bốn thuật toán ghép cặp để tìm kiếm ghép cặp tốt nhất cho mỗi đề xuất giữa các cặp hình ảnh stereo. Mỗi cặp ghép cặp góp phần phân đoạn cảnh vật, sau đó được đưa vào một mạng hồi quy hộp giới hạn 3D. Kết quả từ các thí nghiệm mở rộng trên tập dữ liệu KITTI cho thấy rằng ống dẫn Siamese được đề xuất vượt trội hơn so với các phương pháp hồi quy hộp giới hạn 3D dựa trên stereo tiên tiến nhất hiện nay.

Từ khóa

#Phát hiện đối tượng 3D #Hộp giới hạn 3D #Ghép cặp hình chóp stereo #Mạng phân khúc điểm mây #Khung công tác Siamese

Tài liệu tham khảo

Bao, W., Xu, B., Chen, Z.: Monofenet: monocular 3D object detection with feature enhancement networks. Transactions on Image Processing (2019) Chen, X., Kundu, K., Zhu, Y., Ma, H., Fidler, S., Urtasun, R.: 3D object proposals using stereo imagery for accurate object class detection. Transactions on Pattern Analysis and Machine Intelligence 40(5), 1259–1272 (2017) Du, X., Ang, M.H., Karaman, S., Rus, D.: A general pipeline for 3D detection of vehicles. In: IEEE International Conference on Robotics and Automation, pp 3194–3200 (2018) Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Robotics Res. 32(11), 1231–1237 (2013) Geiger, A., Lenz, P., Urtasun, R.: Official KITTI benchmark. http://www.cvlibs.net/datasets/kitti/. Accessed 19 Nov 2019 He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: IEEE International Conference on Computer Vision, pp 2961–2969 (2017) Königshof, H., Salscheider, N.O., Stiller, C.: Realtime 3D object detection for automated driving using stereo vision and semantic information. In: IEEE International Conference on Intelligent Transportation Systems (2019) Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3D proposal generation and object detection from view aggregation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp 1–8 (2018) Li, K., Ma, W., Sajid, U., Wu, Y., Wang, G.: 2 object detection with convolutional neural networks. Deep Learning in Computer Vision: Principles and Applications 30(31), 41 (2020) Li, P., Chen, X., Shen, S.: Stereo r-cnn based 3D object detection for autonomous driving. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 7644–7652 (2019) Liang, M., Yang, B., Wang, S., Urtasun, R.: Deep continuous fusion for multi-sensor 3D object detection. In: European Conference on Computer Vision, pp 641–656 (2018) Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, pp 2980–2988 (2017) Luo, W., Yang, B., Urtasun, R.: Fast and furious: real time end-to-end 3D detection, tracking and motion forecasting with a single convolutional net. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3569–3577 (2018) Ma, W., Wu, Y., Cen, F., Wang, G.: Mdfn: multi-scale deep feature learning network for object detection. Pattern Recogn. 100, 107149 (2020) Pon, A.D., Ku, J., Li, C., Waslander, S.L.: Object-centric stereo matching for 3D object detection. In: IEEE International Conference on Robotics and Automation, pp 8383–8389 (2020) Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3D object detection from rgb-d data. In: IEEE Conference on Computer Vision and Pattern Recognition , pp 918–927 (2018) Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3D classification and segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 652–660 (2017) Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp 5099–5108 (2017) Qin, Z., Wang, J., Lu, Y.: Triangulation learning network: from monocular to stereo 3D object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 7607–7615 (2019) Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp 91–99 (2015) Shi, S., Wang, X., Li, H.: Pointrcnn: 3D object proposal generation and detection from point cloud. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 770–779 (2019) Shi, S., Wang, Z., Shi, J., Wang, X., Li, H.: From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network. Transactions on Pattern Analysis and Machine Intelligence (2020) Shin, K., Kwon, Y.P., Tomizuka, M.: Roarnet: a robust 3D object detection based on region approximation refinement. In: IEEE Intelligent Vehicles Symposium (IV), pp 2510–2515 (2019) Tian, L., Li, M., Hao, Y., Liu, J., Zhang, G., Chen, Y.Q.: Robust 3-d human detection in complex environments with a depth camera. Trans. Multimed. 20(9), 2249–2261 (2018) Wang, B., An, J., Cao, J.: Voxel-fpn: multi-scale voxel feature aggregation in 3D object detection from point clouds. Sensors 20(3), 704 (2020) Wang, Y., Chao, W.L., Garg, D., Hariharan, B., Campbell, M., Weinberger, K.Q.: Pseudo-lidar from visual depth estimation: bridging the gap in 3D object detection for autonomous driving. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 8445–8453 (2019) Wang, Z., Jia, K.: Frustum convnet: sliding frustums to aggregate local point-wise features for amodal. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp 1742–1749 (2019) Xu, D., Anguelov, D., Jain, A.: Pointfusion: deep sensor fusion for 3D bounding box estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 244–253 (2018) Xu, Z., Zhang, W., Ye, X., Tan, X., Yang, W., Wen, S., Ding, E., Meng, A., Huang, L.: Zoomnet: part-aware adaptive zooming neural network for 3D object detection. In: AAAI, pp 12557–12564 (2020) Yang, B., Luo, W., Urtasun, R.: Pixor: real-time 3D object detection from point clouds. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 7652–7660 (2018) Yang, Z., Sun, Y., Liu, S., Shen, X., Jia, J.: Ipod: intensive point-based object detector for point cloud. arXiv:1812.05276 (2018) Yang, Z., Sun, Y., Liu, S., Shen, X., Jia, J.: Std: sparse-to-dense 3D object detector for point cloud. In: IEEE International Conference on Computer Vision, pp 1951–1960 (2019) You, Y., Wang, Y., Chao, W.L., Garg, D., Pleiss, G., Hariharan, B., Campbell, M., Weinberger, K.Q.: Pseudo-Lidar++: accurate depth for 3D object detection in autonomous driving. In: International Conference on Learning Representations (2019) Zhang, G., Liu, J., Li, H., Chen, Y.Q., Davis, L.S.: Joint human detection and head pose estimation via multistream networks for rgb-d videos. Signal Processing Letters 24(11), 1666–1670 (2017) Zhou, Y., Tuzel, O.: Voxelnet: end-to-end learning for point cloud based 3D object detection. In: IEEE conference on Computer Vision and Pattern Recognition, pp 4490–4499 (2018)