Complementary spatial transformer network for real-time 3D object recognition

Journal of Real-Time Image Processing - Tập 20 - Trang 1-12 - 2023
K. P. Krishna Kumar1, Varghese Paul2
1APJ Abdul Kalam Technological University, CET Campus, Thiruvananthapuram, India
2Department of Computer Science and Engineering, Rajagiri School of Engineering and Technology, Kochi, India

Tóm tắt

Tiny Deep Learning Models offer many advantages in various applications. From the perspective of statistical machine learning theory the contributions of this paper is to complement the research advances and results obtained so far in real-time 3D object recognition. We propose a Tiny Deep Learning Model named Complementary Spatial Transformer Network (CSTN) for Real-Time 3D object recognition. It turns out that CSTN’s working, and analysis are much simplified in a target space setting. We make algorithmic enhancements to perform CSTN computations faster and keep the learning part of CSTN in minimal size. Finally, we provide the experimental verifications of the results obtained in publicly available point cloud data sets ModelNet40 and ShapeNetCore with our model performing 1.65–2 times better in DPS (Detections/s) rate on GPU hardware for 3D object recognition, when compared to state-of-the-art networks. Complementary Spatial Transformer Network architecture requires only 10–35% of trainable parameters, when compared to state-of-the-art networks, making the network easier to deploy in edge AI devices.

Tài liệu tham khảo

Batty, M., Morphet, R., Masucci, P., Stanilov, K.: Entropy, complexity, and spatial information. J. Geogr. Syst. 16, 363–385 (2014) Chen, L., Xu, J., Wang, C., Huang, H., Huang, H., Hu, R.: Uprightrl: upright orientation estimation of 3d shapes via reinforcement learning. In: Computer Graphics Forum, vol. 40, pp. 265–275. Wiley Online Library (2021) Cheney, E.W., Light, W.A.: A Course in Approximation Theory, vol. 101. American Mathematical Soc, Washington, DC (2009) Curry, J., Ghrist, R., Nanda, V.: Discrete morse theory for computing cellular sheaf cohomology. Found. Comput. Math. 16, 875–897 (2016) Disabato, S.: Deep and wide tiny machine learning. In: Special Topics in Information Technology, pp. 79–92. Springer International Publishing, Cham (2022) Disabato, S., Roveri, M.: Tiny machine learning for concept drift. IEEE Trans. Neural Netw. Learn. Syst. 2022, 89 (2022) Fairbank, M., Samothrakis, S., Citi, L.: Deep learning in target space. Rev. Geophys. 59, 3 (2021) Ghrist, R.W.: Elementary Applied Topology, volume 1. Createspace, Seattle (2014) Guo, M.-H., Cai, J.-X., Liu, Z.-N., Tai-Jiang, M., Martin, R.R., Shi-Min, H.: Pct: Point cloud transformer. Comput. Vis. Media 7, 187–199 (2021) Hackbusch, W., Kühn, S.: A new scheme for the tensor representation. J. Fourier Anal. Appl. 15(5), 706–722 (2009) Huang, X., Mei, G., Zhang, J., Abbas, R.: A comprehensive survey on point cloud registration. arXiv:2103.02690 (2021) Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. Adv. Neural Inf. Process. Syst. 28, 78 (2015) Lin, J., Chen, W.-M., Lin, Y., Gan, C., Han, S., et al.: Mcunet: tiny deep learning on iot devices. Adv. Neural. Inf. Process. Syst. 33, 11711–11722 (2020) Liu, Z., Zhang, J., Liu, L.: Upright orientation of 3d shapes with convolutional networks. Graph. Models 85, 22–29 (2016) Lu, D., Xie, Q., Wei, M., Xu, L., Li, J.: Transformers in 3d point clouds: a survey. arXiv:2205.07417 (2022) Maturana, D., Scherer, S.: Voxnet: a 3d convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 922–928, IEEE (2015) Mazenc, E.A., Ranard, D.: Target space entanglement entropy. arXiv:1910.07449 (2019) Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT press, Cambridge (2012) Panagakis, Y., Kossaifi, J., Chrysos, G.G., Oldfield, J., Nicolaou, M.A., Anandkumar, A., Zafeiriou, S.: Tensor methods in computer vision and deep learning. Proc. IEEE 109(5), 863–890 (2021) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch. In: NIPS-W (2017) Qi, C.R.., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 652–660 (2017) Robinson, M., Ghrist, R.: Topological localization via signals of opportunity. IEEE Trans. Signal Process. 60(5), 2362–2373 (2012) Rotman, J.J.: An Introduction to Algebraic Topology, vol. 119. Springer Science & Business Media, Berlin (2013) Tao, A.: Unsupervised point cloud reconstruction for classific feature learning. https://github.com/antao97/UnsupervisedPointCloudReconstruction (2020) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł, Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 89 (2017) Zhi, S., Liu, Y., Li, X., Guo, Y.: Lightnet: a lightweight 3d convolutional neural network for real-time 3d object recognition. In: 3DOR@ Eurographics (2017) Zhi, S., Liu, Y., Li, X., Guo, Y.: Toward real-time 3d object recognition: a lightweight volumetric cnn framework using multitask learning. Comput. Graph. 71, 199–207 (2018)