TRandAugment: temporal random augmentation strategy for surgical activity recognition from videos

Springer Science and Business Media LLC - Tập 18 - Trang 1665-1672 - 2023
Sanat Ramesh1,2, Diego Dall’Alba1, Cristians Gonzalez3,4, Tong Yu2, Pietro Mascagni4,5, Didier Mutter3,6,4, Jacques Marescaux6, Paolo Fiorini1, Nicolas Padoy2,4
1Altair Robotics Lab, University of Verona, Verona, Italy
2ICube, University of Strasbourg, CNRS, Strasbourg, France
3University Hospital of Strasbourg, Strasbourg, France
4Institute of Image-Guided Surgery, IHU Strasbourg, Strasbourg, France
5Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy
6IRCAD, Strasbourg, France

Tóm tắt

Automatic recognition of surgical activities from intraoperative surgical videos is crucial for developing intelligent support systems for computer-assisted interventions. Current state-of-the-art recognition methods are based on deep learning where data augmentation has shown the potential to improve the generalization of these methods. This has spurred work on automated and simplified augmentation strategies for image classification and object detection on datasets of still images. Extending such augmentation methods to videos is not straightforward, as the temporal dimension needs to be considered. Furthermore, surgical videos pose additional challenges as they are composed of multiple, interconnected, and long-duration activities. This work proposes a new simplified augmentation method, called TRandAugment, specifically designed for long surgical videos, that treats each video as an assemble of temporal segments and applies consistent but random transformations to each segment. The proposed augmentation method is used to train an end-to-end spatiotemporal model consisting of a CNN (ResNet50) followed by a TCN. The effectiveness of the proposed method is demonstrated on two surgical video datasets, namely Bypass40 and CATARACTS, and two tasks, surgical phase and step recognition. TRandAugment adds a performance boost of 1–6% over previous state-of-the-art methods, that uses manually designed augmentations. This work presents a simplified and automated augmentation method for long surgical videos. The proposed method has been validated on different datasets and tasks indicating the importance of devising temporal augmentation methods for long surgical videos.

Tài liệu tham khảo

Zisimopoulos O, Flouty E, Luengo I, Giataganas P, Nehme J, Chow A, Stoyanov D (2018) DeepPhase: surgical phase recognition in cataracts videos. In: International conference on medical image computing and computer-assisted intervention, pp. 265–272

Lim S, Kim I, Kim T, Kim C, Kim S(2019) Fast autoaugment. Adv Neural Inform Process Syst 32

He K, Gkioxari G, Dollar P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision (ICCV)