Multi-task temporal convolutional networks for joint recognition of surgical phases and steps in gastric bypass procedures

Springer Science and Business Media LLC - Tập 16 - Trang 1111-1119 - 2021
Sanat Ramesh1,2, Diego Dall’Alba1, Cristians Gonzalez3, Tong Yu2, Pietro Mascagni2,4, Didier Mutter3,5, Jacques Marescaux5, Paolo Fiorini1, Nicolas Padoy2
1Altair Robotics Lab, Department of Computer Science, University of Verona, Verona, Italy
2ICube, University of Strasbourg, CNRS, IHU Strasbourg, France
3University Hospital of Strasbourg, IHU Strasbourg, France
4Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy
5IRCAD, Strasbourg, France

Tóm tắt

Automatic segmentation and classification of surgical activity is crucial for providing advanced support in computer-assisted interventions and autonomous functionalities in robot-assisted surgeries. Prior works have focused on recognizing either coarse activities, such as phases, or fine-grained activities, such as gestures. This work aims at jointly recognizing two complementary levels of granularity directly from videos, namely phases and steps. We introduce two correlated surgical activities, phases and steps, for the laparoscopic gastric bypass procedure. We propose a multi-task multi-stage temporal convolutional network (MTMS-TCN) along with a multi-task convolutional neural network (CNN) training setup to jointly predict the phases and steps and benefit from their complementarity to better evaluate the execution of the procedure. We evaluate the proposed method on a large video dataset consisting of 40 surgical procedures (Bypass40). We present experimental results from several baseline models for both phase and step recognition on the Bypass40. The proposed MTMS-TCN method outperforms single-task methods in both phase and step recognition by 1-2% in accuracy, precision and recall. Furthermore, for step recognition, MTMS-TCN achieves a superior performance of 3-6% compared to LSTM-based models on all metrics. In this work, we present a multi-task multi-stage temporal convolutional network for surgical activity recognition, which shows improved results compared to single-task models on a gastric bypass dataset with multi-level annotations. The proposed method shows that the joint modeling of phases and steps is beneficial to improve the overall recognition of each type of activity.

Tài liệu tham khảo

Obesity: preventing and managing the global epidemic. Report of a WHO consultation. World Health Organ Tech Rep Ser 894, 1–253 (2000) Ahmidi N, Tao L, Sefati S, Gao Y, Lea C, Haro BB, Zappella L, Khudanpur S, Vidal R, Hager GD (2017) A dataset and benchmarks for segmentation and recognition of gestures in robotic surgery. IEEE Trans Biomed Eng 64(9):2025–2041 Angrisani L, Santonicola A, Iovino P, Formisano G, Buchwald H, Scopinaro N (2015) Bariatric surgery worldwide 2013. Obes Surg 25(10):1822–1832 Birkmeyer JD, Finks JF, OReilly A, Oerline M, Carlin AM, Nunn AR, Dimick J, Banerjee M, Birkmeyer NJ, (2013) Surgical skill and complication rates after bariatric surgery. New Engl J Med 369(15):1434–1442. https://doi.org/10.1056/nejmsa1300625 Bricon-Souf N, Newman CR (2007) Context awareness in health care: A review. Int J Med Inf 76(1):2–12 Cleary K, Kinsella A (2005) OR 2020: The operating room of the future - workshop report. J Laparoendosc Adv Surg Tech - Part A 15(5):495–573 Czempiel T, Paschali M, Keicher M, Simson W, Feussner H, Kim ST, Navab N (2020) Tecno: Surgical phase recognition with multi-stage temporal convolutional networks. In: MICCAI Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 2650–2658. https://doi.org/10.1109/ICCV.2015.304 Farha YA, Gall J (2019) MS-TCN: Multi-stage temporal convolutional network for action segmentation. In: CVPR Funke I, Bodenstedt S, Oehme F, von Bechtolsheim F, Weitz J, Speidel S (2019) Using 3d convolutional neural networks to learn spatiotemporal features for automatic surgical gesture recognition in video. In: MICCAI Hajj HA, Lamard M, Conze PH, Cochener B, Quellec G (2018) Monitoring tool usage in surgery videos using boosted convolutional and recurrent neural networks. Med Image Anal 47:203–218 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: Computer Vision – ECCV 2016, pp. 630–645. Springer International Publishing Jin A, Yeung S, Jopling J, Krause J, Azagury D, Milstein A, Fei-Fei L (2018) Tool detection and operative skill assessment in surgical videos using region-based convolutional neural networks. 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) pp. 691–699 Jin Y, Dou Q, Chen H, Yu L, Qin J, Fu CW, Heng PA (2018) SV-RCNet: Workflow recognition from surgical videos using recurrent convolutional network. IEEE Trans Med Imaging 37(5):1114–1126 Jin Y, Li H, Dou Q, Chen H, Qin J, Fu C, Heng P (2020) Multi-task recurrent convolutional network with correlation loss for surgical video analysis. Medical image analysis 59: Kaijser MA, van Ramshorst GH, Emous M, Veeger NJGM, van Wagensveld BA, Pierie JPEN (2018) A delphi consensus of the crucial steps in gastric bypass and sleeve gastrectomy procedures in the netherlands. Obesity Surg 28(9):2634–2643 Katić D, Julliard C, Wekerle AL, Kenngott H, Müller-Stich BP, Dillmann R, Speidel S, Jannin P, Gibaud B (2015) LapOntoSPM: an ontology for laparoscopic surgeries and its application to surgical phase recognition. Int J Comput Assisted Radiol Surg 10(9):1427–1434 Kranzfelder M, Staub C, Fiolka A, Schneider A, Gillen S, Wilhelm D, Friess H, Knoll A, Feussner H (2012) Toward increased autonomy in the surgical OR: needs, requests, and expectations. Surg Endoscopy 27(5):1681–1688 Lea C, Vidal R, Reiter A, Hager GD (2016) Temporal convolutional networks: A unified approach to action segmentation. In: Lecture Notes in Computer Science, pp. 47–54. Springer International Publishing Maier-Hein L, Vedula SS, Speidel S, Navab N, Kikinis R, Park A, Eisenmann M, Feussner H, Forestier G, Giannarou S, Hashizume M, Katic D, Kenngott H, Kranzfelder M, Malpani A, März K, Neumuth T, Padoy N, Pugh C, Schoch N, Stoyanov D, Taylor R, Wagner M, Hager GD, Jannin P (2017) Surgical data science for next-generation interventions. Nat Biomed Eng 1(9):691–696. https://doi.org/10.1038/s41551-017-0132-7 Nwoye CI, Mutter D, Marescaux J, Padoy N (2019) Weakly supervised convolutional lstm approach for tool tracking in laparoscopic videos. Int J Comput Assisted Radiol Surg 14:1059–1067 van den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) WaveNet: A generative model for raw audio. In: Arxiv Twinanda AP (2017) Vision-based approaches for surgical activity recognition using laparoscopic and rbgd videos. In: PhD thesis Twinanda AP, Shehata S, Mutter D, Marescaux J, de Mathelin M, Padoy N (2017) EndoNet: A deep architecture for recognition tasks on laparoscopic videos. IEEE Trans Med Imaging 36(1):86–97 Varadarajan B, Reiley C, Lin H, Khudanpur S, Hager G (2009) Data-derived models for segmentation with application to surgical assessment and training. In: G.Z. Yang, D. Hawkes, D. Rueckert, A. Noble, C. Taylor (eds.) MICCAI, pp. 426–434 Vercauteren T, Unberath M, Padoy N, Navab N (2020) Cai4cai: The rise of contextual artificial intelligence in computer-assisted interventions. Proc IEEE 108(1):198–214 Yu T, Mutter D, Marescaux J, Padoy N (2019) Learning from a tiny dataset of manual annotations: a teacher/student approach for surgical phase recognition Zappella L, Béjar B, Hager G, Vidal R (2013) Surgical gesture classification from video and kinematic data. Med Image Anal 17(7):732–745 Zisimopoulos O, Flouty E, Luengo I, Giataganas P, Nehme J, Chow A, Stoyanov D (2018) DeepPhase: Surgical phase recognition in cataracts videos. In: MICCAI