Combining weakly and strongly supervised learning improves strong supervision in Gleason pattern classification

BMC Medical Imaging - Tập 21 - Trang 1-14 - 2021
Sebastian Otálora1,2, Niccolò Marini1,2, Henning Müller1,3, Manfredo Atzori1,4
1HES-SO Valais, Sierre, Switzerland
2Computer Science Centre (CUI), University of Geneva, Carouge, Switzerland
3Faculty of Medicine, University of Geneva, Geneva, Switzerland
4Department of Neuroscience, University of Padova, Padova, Italy

Tóm tắt

One challenge to train deep convolutional neural network (CNNs) models with whole slide images (WSIs) is providing the required large number of costly, manually annotated image regions. Strategies to alleviate the scarcity of annotated data include: using transfer learning, data augmentation and training the models with less expensive image-level annotations (weakly-supervised learning). However, it is not clear how to combine the use of transfer learning in a CNN model when different data sources are available for training or how to leverage from the combination of large amounts of weakly annotated images with a set of local region annotations. This paper aims to evaluate CNN training strategies based on transfer learning to leverage the combination of weak and strong annotations in heterogeneous data sources. The trade-off between classification performance and annotation effort is explored by evaluating a CNN that learns from strong labels (region annotations) and is later fine-tuned on a dataset with less expensive weak (image-level) labels. As expected, the model performance on strongly annotated data steadily increases as the percentage of strong annotations that are used increases, reaching a performance comparable to pathologists ( $$\kappa = 0.691 \pm 0.02$$ ). Nevertheless, the performance sharply decreases when applied for the WSI classification scenario with $$\kappa = 0.307 \pm 0.133$$ . Moreover, it only provides a lower performance regardless of the number of annotations used. The model performance increases when fine-tuning the model for the task of Gleason scoring with the weak WSI labels $$\kappa = 0.528 \pm 0.05$$ . Combining weak and strong supervision improves strong supervision in classification of Gleason patterns using tissue microarrays (TMA) and WSI regions. Our results contribute very good strategies for training CNN models combining few annotated data and heterogeneous data sources. The performance increases in the controlled TMA scenario with the number of annotations used to train the model. Nevertheless, the performance is hindered when the trained TMA model is applied directly to the more challenging WSI classification problem. This demonstrates that a good pre-trained model for prostate cancer TMA image classification may lead to the best downstream model if fine-tuned on the WSI target dataset. We have made available the source code repository for reproducing the experiments in the paper: https://github.com/ilmaro8/Digital_Pathology_Transfer_Learning

Tài liệu tham khảo

Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394–424. Epstein JI. An update of the gleason grading system. J Urol. 2010;183(2):433–40. Egevad L, Delahunt B, Srigley JR, Samaratunga H (2016) International society of urological pathology (ISUP) grading of prostate cancer—an ISUP consensus on contemporary grading. Wiley Online Library Fraggetta F, Garozzo S, Zannoni GF, Pantanowitz L, Rossi ED. Routine digital pathology workflow: the catania experience. J Pathol Inform. 2017;8:51. Komura D, Ishikawa S. Machine learning methods for histopathological image analysis. Comput Struct Biotechnol J. 2018;16:34–42. Schaer R, Otálora S, Jimenez-del-Toro O, Atzori M, Müller H. Deep learning-based retrieval system for gigapixel histopathology cases and the open access literature. J Pathol Inform. 2019;10:19. Bulten W, Pinckaers H, van Boven H, Vink R, de Bel T, van Ginneken B, van der Laak J, Hulsbergen-van de Kaa C, Litjens G (2020) Automated gleason grading of prostate biopsies using deep learning. Lancet Oncol Otálora S, Atzori M, Khan A, Jimenez-del-Toro O, Andrearczyk V, Müller H (2020) A systematic comparison of deep learning strategies for weakly supervised gleason grading. In: Medical imaging 2020: digital pathology, vol 11320, International Society for Optics and Photonics, p 113200 Campanella G, Hanna MG, Geneslaw L, Miraflor A, Silva VWK, Busam KJ, Brogi E, Reuter VE, Klimstra DS, Fuchs TJ. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat Med. 2019;25(8):1301–9. Nagpal K, Foote D, Liu Y, Chen P-HC, Wulczyn E, Tan F, Olson N, Smith JL, Wren MA. Development and validation of a deep learning algorithm for improving gleason scoring of prostate cancer. Digital Med. 2019;2(1):48. Burlutskiy N, Pinchaud N, Gu F, Hägg D, Andersson M, Björk L, Eurén K, Svensson C, Wilén LK, Hedlund M (2019) Segmenting potentially cancerous areas in prostate biopsies using semi-automatically annotated data. In: Cardoso MJ, Feragen A, Glocker B, Konukoglu E, Oguz I, Unal G, Vercauteren T (eds) Proceedings of the 2nd international conference on medical imaging with deep learning. Proceedings of machine learning research, vol 102, PMLR, London, United Kingdom 2019, pp 92–108. http://proceedings.mlr.press/v102/burlutskiy19a.html Ström P, Kartasalo K, Olsson H, Solorzano L, Delahunt B, Berney DM, Bostwick DG, Evans AJ, Grignon DJ, Humphrey PA et al (2019) Pathologist-level grading of prostate biopsies with artificial intelligence. arXiv:1907.01368 Ilse M, Tomczak J, Welling M (2018) Attention-based deep multiple instance learning. In: Dy J, Krause A (eds) Proceedings of the 35th international conference on machine learning. Proceedings of machine learning research, vol 80, PMLR, Stockholmsmässan, Stockholm Sweden 2018, pp 2127–2136. http://proceedings.mlr.press/v80/ilse18a.html Arvaniti E, Fricker KS, Moret M, Rupp N, Hermanns T, Fankhauser C, Wey N, Wild PJ, Rueschoff JH, Claassen M (2018) Automated gleason grading of prostate cancer tissue microarrays via deep learning. Scie Rep Arvaniti E, Claassen M (2018) Coupling weak and strong supervision for classification of prostate cancer histopathology images. In: Medical imaging meets NIPS workshop Otálora S, Perdomo O, González F, Müller H (2017) Training deep convolutional neural networks with active learning for exudate classification in eye fundus images. In: Intravascular imaging and computer assisted stenting, and large-scale annotation of biomedical data and expert label synthesis, Springer, pp 146–154 Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, Van Der Laak JA, Van Ginneken B, Sánchez CI. A survey on deep learning in medical image analysis. Med Image Anal. 2017;42:60–88. Tajbakhsh N, Shin JY, Gurudu SR, Hurst RT, Kendall CB, Gotway MB, Liang J. Convolutional neural networks for medical image analysis: full training or fine tuning? IEEE Trans Med Imaging. 2016;35(5):1299–312. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, IEEE, pp 248–255 Kieffer B, Babaie M, Kalra S, Tizhoosh HR (2017) Convolutional neural networks for histopathology image classification: training vs. using pre-trained networks. In: 2017 Seventh international conference on image processing theory, tools and applications (IPTA), IEEE, pp 1–6 Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press. http://www.deeplearningbook.org Bengio Y (2012) Deep learning of representations for unsupervised and transfer learning. In: Proceedings of ICML workshop on unsupervised and transfer learning, pp 17–36 Mormont R, Geurts P, Marée R (2018) Comparison of deep transfer learning strategies for digital pathology. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 2262–2271 Han S, Hwang SI, Lee HJ. A weak and semi-supervised segmentation method for prostate cancer in trus images. J Digital Imaging. 2020;2020:1–8. Li J, Li W, Gertych A, Knudsen BS, Speier W, Arnold CW (2019) An attention-based multi-resolution model for prostate whole slide image classification and localization. In: Medical computer vision workshop—CVPR Katharopoulos A, Fleuret F (2019) Processing megapixel images with deep attention-sampling models. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning. Proceedings of machine learning research, vol 97, PMLR, Long Beach, California, USA, pp 3282–3291. http://proceedings.mlr.press/v97/katharopoulos19a.html van der Laak J, Ciompi F, Litjens G. No pixel-level annotations needed. Nat Biomed Eng. 2019;2019:1–2. Recht B, Roelofs R, Schmidt L, Shankar V (2019) Do imagenet classifiers generalize to imagenet? arXiv:1902.10811 Otálora S, Atzori M, Khan A, Jimenez-del-Toro O, Andrearczyk V, Müller H (2020) Systematic comparison of deep learning strategies for weakly supervised Gleason grading. In: Tomaszewski JE et al Ward AD (eds) Medical imaging 2020: digital pathology, 2020, vol 11320, SPIE, International Society for Optics and Photonics, pp 142–149. https://doi.org/10.1117/12.2548571 Epstein JI, Zelefsky MJ, Sjoberg DD, Nelson JB, Egevad L, Magi-Galluzzi C, Vickers AJ, Parwani AV, Reuter VE, Fine SW, et al. A contemporary prostate cancer grading system: a validated alternative to the gleason score. Eur Urol. 2016;69(3):428–35. del Toro OJ, Atzori M, Otálora S, Andersson M, Eurén K, Hedlund M, Rönnquist P, Müller H (2017) Convolutional neural networks for an automatic classification of prostate tissue slides with high-grade gleason score. In: Medical imaging 2017: digital pathology, vol 10140, International Society for Optics and Photonics, p 101400 Tellez D, Litjens G, Bándi P, Bulten W, Bokhorst J-M, Ciompi F, van der Laak J. Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology. Med Image Anal. 2019;58:101544. https://doi.org/10.1016/j.media.2019.101544. Otálora S, Atzori M, Andrearczyk V, Khan A, Müller H. Staining invariant features for improving generalization of deep convolutional neural networks in computational pathology. Front Bioeng Biotechnol. 2019;7:198. Tellez D, Litjens G, van der Laak J, Ciompi F (2019) Neural image compression for gigapixel histopathology image analysis. IEEE Trans Pattern Anal Mach Intell Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, et al. The cancer imaging archive (tcia): maintaining and operating a public information repository. J Digit Imaging. 2013;26(6):1045–57. Zuley M, Jarosz R, Drake B, Rancilio D, Klim A, Rieger-Christ K, Lemmerman J (2016) Radiology data from the cancer genome atlas prostate adenocarcinoma [tcga-prad] collection. Cancer Imaging Arch Abeshouse A, Ahn J, Akbani R, Ally A, Amin S, Andry CD, Annala M, Aprikian A, Armenia J, Arora A, et al. The molecular taxonomy of primary prostate cancer. Cell. 2015;163(4):1011–25. Janowczyk A, Zuo R, Gilmore H, Feldman M, Madabhushi A. Histoqc: an open-source quality control tool for digital pathology slides. JCO Clin Cancer Inform. 2019;3:1–7. Rousson M, Hedlund M, Andersson M, Jacobsson L, Läthén G, Norell B, Jimenez-del-Toro O, Müller H, Atzori M (2018) Tumor proliferation assessment of whole slide images. In: Medical imaging 2018: digital pathology, vol 105810, International Society for Optics and Photonics Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? In: Advances in neural information processing systems, pp 3320–3328 Bloice MD, Stocker C, Holzinger A (2017) Augmentor: an image augmentation library for machine learning. arXiv preprint arXiv:1708.04680 Tellez D, Litjens G, Bándi P, Bulten W, Bokhorst J-M, Ciompi F, van der Laak J. Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology. Med Image Anal. 2019;58:101544. Byfield P et al (2020) Staintools: tools for tissue image stain normalisation and augmentation in python. Github Reposit. https://github.com/Peter554/StainTools Vahadane A, Peng T, Sethi A, Albarqouni S, Wang L, Baust M, Steiger K, Schlitter AM, Esposito I, Navab N. Structure-preserving color normalization and sparse stain separation for histological images. IEEE Trans Med Imaging. 2016;35(8):1962–71. Arvaniti E, Fricker K, Moret M, Rupp N, Hermanns T, Fankhauser C, Wey N, Wild P, Rüschoff JH, Claassen M (2018) Replication data for: automated gleason grading of prostate cancer tissue microarrays via deep learning. https://doi.org/10.7910/DVN/OCYCMP Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 Cheplygina V, de Bruijne M, Pluim JP. Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis. Med Image Anal. 2019;54:280–96. Otálora S, Marini N, Müller H, Atzori M (2020) Semi-weakly supervised learning for prostate cancer image classification with teacher-student deep convolutional networks. In: Interpretable and annotation-efficient learning for medical image computing, Springer, pp 193–203