Exploiting deep learning and augmented reality in fused deposition modeling: a focus on registration
Tóm tắt
The current study aimed to propose a Deep Learning (DL) based framework to retrieve in real-time the position and the rotation of an object in need of maintenance from live video frames only. For testing the positioning performances, we focused on intervention on a generic Fused Deposition Modeling (FDM) 3D printer maintenance. Lastly, to demonstrate a possible Augmented Reality (AR) application that can be built on top of this, we discussed a specific case study using a Prusa i3 MKS FDM printer. This method was developed using a You Only Look Once (YOLOv3) network for object detection to locate the position of the FDM 3D printer and a subsequent Rotation Convolutional Neural Network (RotationCNN), trained on a dataset of artificial images, to predict the rotations’ parameters for attaching the 3D model. To train YOLOv3 we used an augmented dataset of 1653 real images, while to train the RotationCNN we utilized a dataset of 99.220 synthetic images, showing the FDM 3D Printer with different orientations, and fine-tuned it using 235 real images tagged manually. The YOLOv3 network obtained an AP (Average Precision) of 100% with Intersection Over Unit parameter of 0.5, while the RotationCNN showed a mean Geodesic Distance of 0.250 (σ = 0.210) and a mean accuracy to detect the correct rotation r of 0.619 (σ = 0.130), considering as acceptable the range [r − 10, r + 10]. We then evaluate the CAD system performances with 10 non-expert users: the average speed improved from 9.61 (σ = 1.53) to 5.30 (σ = 1.30) and the average number of actions to complete the task from 12.60 (σ = 2.15) to 11.00 (σ = 0.89). This work is a further step through the adoption of DL and AR in the assistance domain. In future works, we will overcome the limitations of this approach and develop a complete mobile CAD system that could be extended to any object that presents a 3D counterpart model.
Tài liệu tham khảo
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Tanzi, L., Vezzetti, E., Moreno, R., Aprato, A., Audisio, A., Massè, A.: Hierarchical fracture classification of proximal femur X-ray images using a multistage deep learning approach. Eur. J. Radiol. 133, 109373 (2020)
Tanzi, L., Piazzolla, P., Porpiglia, F., Vezzetti, E.: Real-time deep learning semantic segmentation during intra-operative surgery for 3D augmented reality assistance. Int. J. CARS 16, 1435–1445 (2021). https://doi.org/10.1007/s11548-021-02432-y
Tanzi, L., Audisio, A., Cirrincione, G., Aprato, A., Vezzetti, E.: Vision transformer for femur fracture classification. Injury 53(7), 2625–2634 (2022)
Young, T., Hazarika, D., Poria, S., Cambria, E.: Recent trends in deep learning based natural language processing. IEEE Comput. Intell. Mag. 13(3), 55–75 (2018)
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., Hassabis, D.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Cranmer, E.E., Tom Dieck, M.C., Fountoulaki, P.: Exploring the value of augmented reality for tourism. Tour. Manag. Perspect. 35, 100672 (2020)
Hughes, C.E., Stapleton, C.B., Hughes, D.E., Smith, E.M.: Mixed reality in education, entertainment, and training. IEEE Comput. Graph. Appl. 25(6), 24–30 (2005)
Gribaudo, M., Piazzolla, P., Porpiglia, F., Vezzetti, E., Violante, M.G.: 3D augmentation of the surgical video stream: toward a modular approach. Comput. Methods Programs Biomed. 191, 105505 (2020)
Nee, A.Y.C., Ong, S.K., Chryssolouris, G., Mourtzis, D.: Augmented reality applications in design and manufacturing. CIRP Ann. 61(2), 657–679 (2012)
Komonen, K.: A cost model of industrial maintenance for profitability analysis and benchmarking. Int. J. Prod. Econ. 79(1), 15–31 (2002)
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv:1804.02767 [cs] (2018)
Community, B.O.: Blender: a 3D modelling and rendering package. Stichting Blender Foundation, Amsterdam (2018)
Palmarini, R., Erkoyuncu, J.A., Roy, R., Torabmostaedi, H.: A systematic review of augmented reality applications in maintenance. Robot. Comput. Integr. Manuf. 49, 215–228 (2018)
Wójcicki, T.: Supporting the diagnostics and the maintenance of technical devices with augmented reality. Diagnostyka 15(1), 43–47 (2017)
Regenbrecht, H., Baratoff, G., Wilke, W.: Augmented reality projects in the automotive and aerospace industries. IEEE Comput. Graph. Appl. 25(6), 48–56 (2005)
Webel, S., Bockholt, U., Engelke, T., Gavish, N., Olbrich, M., Preusche, C.: An augmented reality training platform for assembly and maintenance skills. Robot. Auton. Syst. 61(4), 398–403 (2013)
Azuma, R.T.: A survey of augmented reality. Presence Teleoper Virtual Environ. 6(4), 355–85 (1997)
Lee, S.G., Ma, Y.-S., Thimm, G.L., Verstraeten, J.: Product lifecycle management in aviation maintenance, repair and overhaul. Comput. Ind. 59(2), 296–303 (2008)
Sanna, A., Manuri, F., Lamberti, F., Paravati, G., Pezzolla, P.: Using handheld devices to support augmented reality-based maintenance and assembly tasks. In: 2015 IEEE International Conference on Consumer Electronics (ICCE), pp. 178–9 (2015)
Westerfield, G., Mitrovic, A., Billinghurst, M.: Intelligent augmented reality training for motherboard assembly. Int. J. Artif. Intell. Educ. 25(1), 157–172 (2015)
Wang, X., Ong, S.K., Nee, A.Y.C.: Real-virtual components interaction for assembly simulation and planning. Robot. Comput. Integr. Manuf. 41, 102–114 (2016)
Li, X., Cai, Y., Wang, S., Lu, T.: Learning category-level implicit 3D rotation representations for 6D pose estimation from RGB images. In: 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 2310–2315 (2019)
Muñoz, E., Konishi, Y., Beltran, C., Murino, V., Bue, A.D.: Fast 6D pose from a single RGB image using cascaded forests templates. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4062–4069 (2016)
Liu, F., Fang, P., Yao, Z., Fan, R., Pan, Z., Sheng, W., Yang, H.: Recovering 6D object pose from RGB indoor image based on two-stage detection network with multi-task loss. Neurocomputing 337, 15–23 (2019)
Zuo, G., Zhang, C., Liu, H., Gong, D.: Low-quality rendering-driven 6D object pose estimation from single RGB image. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2020)
Zhao, W., Zhang, S., Guan, Z., Luo, H., Tang, L., Peng, J., Fan, J.: 6D object pose estimation via viewpoint relation reasoning. Neurocomputing 389, 9–17 (2020)
Josifovski, J., Kerzel, M., Pregizer, C., Posniak, L., Wermter, S.: Object detection and pose estimation based on convolutional neural networks trained with synthetic data. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6269–6276 (2018)
Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 2686–2694 (2015)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. IEEE (2016). Available from: http://ieeexplore.ieee.org/document/7780459/
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818–26.. IEEE, Las Vegas, NV, USA (2016) [cited 2019 Nov 25]. Available from: http://ieeexplore.ieee.org/document/7780677/
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16 × 16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2021). Available from: https://openreview.net/forum?id=YicbFdNTTy
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Bach, F., Blei, D. (eds) Proceedings of the 32nd International Conference on Machine Learning, pp. 448–456. PMLR, Lille, France (2015). (Proceedings of Machine Learning Research; vol. 37). Available from: http://proceedings.mlr.press/v37/ioffe15.html
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(56), 1929–1958 (2014)
Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: SUN database: large-scale scene recognition from abbey to zoo. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3485–3492 (2010)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes challenge 2012 (VOC2012) results. Available from: http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html
Chollet, F. et al.: Keras (2015). Available from: https://keras.io