Deep learning for detection of radiographic sacroiliitis: achieving expert-level performance

Keno K. Bressem1, Janis L. Vahldiek1, Lisa C. Adams1, Stefan M. Niehues1, Hildrun Haibel2, V. Rios Rodriguez2, Murat Torğutalp2, Mikhail Protopopov2, Fabian Proft2, Judith Rademacher3, Joachim Sieper2, Martín Rudwaleit4, Bernd Hamm1, Marcus R. Makowski5, Kay-Geert A. Hermann1, Denis Poddubnyy2
1Department of Radiology, Charité — Universitätsmedizin Berlin, Hindenburgdamm 30, 12203, Berlin, Germany
2Department of Gastroenterology, Infectious Diseases, and Rheumatology, Charité - Universitätsmedizin Berlin, Berlin, Germany
3Berlin Institute of Health (BIH), Berlin, Germany
4Department of Internal Medicine and Rheumatology, Klinikum Bielefeld Rosenhöhe, Bielefeld, Germany
5Department of Diagnostic and Interventional Radiology, School of Medicine, Technical University of Munich, Munich, Germany

Tóm tắt

Abstract Background Radiographs of the sacroiliac joints are commonly used for the diagnosis and classification of axial spondyloarthritis. The aim of this study was to develop and validate an artificial neural network for the detection of definite radiographic sacroiliitis as a manifestation of axial spondyloarthritis (axSpA). Methods Conventional radiographs of the sacroiliac joints obtained in two independent studies of patients with axSpA were used. The first cohort comprised 1553 radiographs and was split into training (n = 1324) and validation (n = 229) sets. The second cohort comprised 458 radiographs and was used as an independent test dataset. All radiographs were assessed in a central reading session, and the final decision on the presence or absence of definite radiographic sacroiliitis was used as a reference. The performance of the neural network was evaluated by calculating areas under the receiver operating characteristic curves (AUCs) as well as sensitivity and specificity. Cohen’s kappa and the absolute agreement were used to assess the agreement between the neural network and the human readers. Results The neural network achieved an excellent performance in the detection of definite radiographic sacroiliitis with an AUC of 0.97 and 0.94 for the validation and test datasets, respectively. Sensitivity and specificity for the cut-off weighting both measurements equally were 88% and 95% for the validation and 92% and 81% for the test set. The Cohen’s kappa between the neural network and the reference judgements were 0.79 and 0.72 for the validation and test sets with an absolute agreement of 90% and 88%, respectively. Conclusion Deep artificial neural networks enable the accurate detection of definite radiographic sacroiliitis relevant for the diagnosis and classification of axSpA.

Từ khóa


Tài liệu tham khảo

van der Linden S, Valkenburg HA, Cats A. Evaluation of diagnostic criteria for ankylosing spondylitis. A proposal for modification of the New York criteria. Arthritis Rheum. 1984;27(4):361–8. https://doi.org/10.1002/art.1780270401.

Poddubnyy D, Brandt H, Vahldiek J, Spiller I, Song IH, Rudwaleit M, Sieper J. The frequency of non-radiographic axial spondyloarthritis in relation to symptom duration in patients referred because of chronic back pain: results from the Berlin Early Spondyloarthritis Clinic. Ann Rheum Dis. 2012;71(12):1998–2001. https://doi.org/10.1136/annrheumdis-2012-201945.

Mandl P, Navarro-Compan V, Terslev L, et al. Eular recommendations for the use of imaging in the diagnosis and management of spondyloarthritis in clinical practice. Ann Rheum Dis. 2015;74(7):1327–39. https://doi.org/10.1136/annrheumdis-2014-206971.

Rudwaleit M, van der Heijde D, Landewe R, Listing J, Akkoc N, Brandt J, Braun J, Chou CT, Collantes-Estevez E, Dougados M, Huang F, Gu J, Khan MA, Kirazli Y, Maksymowych WP, Mielants H, Sorensen IJ, Ozgocmen S, Roussou E, Valle-Onate R, Weber U, Wei J, Sieper J. The development of assessment of Spondyloarthritis International Society classification criteria for axial Spondyloarthritis (part II): validation and final selection. Ann Rheum Dis. 2009;68(6):777–83. https://doi.org/10.1136/ard.2009.108233.

Boel A, Molto A, van der Heijde D, Ciurea A, Dougados M, Gensler LS, Santos MJ, de Miguel E, Poddubnyy D, Rudwaleit M, van Tubergen A, van Gaalen FA, Ramiro S. Do patients with axial Spondyloarthritis with radiographic sacroiliitis fulfil both the modified New York criteria and the ASAS axial spondyloarthritis criteria? Results from eight cohorts. Ann Rheum Dis. 2019;78(11):1545–9. https://doi.org/10.1136/annrheumdis-2019-215707.

Spoorenberg A, de Vlam K, van der Linden S, et al. Radiological scoring methods in ankylosing spondylitis. Reliability and change over 1 and 2 years. J Rheumatol. 2004;31(1):125–32.

Christiansen AA, Hendricks O, Kuettel D, Hørslev-Petersen K, Jurik AG, Nielsen S, Rufibach K, Loft AG, Pedersen SJ, Hermansen LT, Østergaard M, Arnbak B, Manniche C, Weber U. Limited reliability of radiographic assessment of sacroiliac joints in patients with suspected early spondyloarthritis. J Rheumatol. 2017;44(1):70–7. https://doi.org/10.3899/jrheum.160079.

Yazici H, Turunc M, Ozdoğan H, et al. Observer variation in grading sacroiliac radiographs might be a cause of 'sacroiliitis’ reported in certain disease states. Ann Rheum Dis. 1987;46(2):139–45. https://doi.org/10.1136/ard.46.2.139.

Poddubnyy D, Rudwaleit M, Haibel H, Listing J, Marker-Hermann E, Zeidler H, Braun J, Sieper J. Rates and predictors of radiographic sacroiliitis progression over 2 years in patients with axial spondyloarthritis. Ann Rheum Dis. 2011;70(8):1369–74. https://doi.org/10.1136/ard.2010.145995.

Van den Berg R, Lenczner G, Feydy A, et al. Agreement between clinical practice and trained central reading in reading of sacroiliac joints on plain pelvic radiographs: results from the Desir cohort. Arthritis Rheum. 2014;66(9):2403–11. https://doi.org/10.1002/art.38738.

McKinney SM, Sieniek M, Godbole V, et al. International evaluation of an Ai system for breast cancer screening. Nature. 2020;577(7788):89–94. https://doi.org/10.1038/s41586-019-1799-6.

Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115–8. https://doi.org/10.1038/nature21056.

Irvin J, Rajpurkar P, Ko M, et al. Chexpert: a large chest radiograph dataset with uncertainty labels and expert comparison. Thirty-Third Aaai Conference on Artificial Intelligence / Thirty-First Innovative Applications of Artificial Intelligence Conference / Ninth Aaai Symposium on Educational Advances in Artificial Intelligence 2019:590–597.

Rudwaleit M, Haibel H, Baraliakos X, Listing J, Märker-Hermann E, Zeidler H, Braun J, Sieper J. The early disease stage in axial spondylarthritis: results from the German spondyloarthritis inception cohort. Arthritis Rheum. 2009;60(3):717–27. https://doi.org/10.1002/art.24483.

Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T. Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In;  Wallach H, Larochelle H, Beygelzimer A,  d\textquotesingle Alch\'{e}-Buc F, Fox E, Garnett R, editors. Advances in Neural Information Processing Systems. Red Hook: Curran Associates, Inc. 2019;32. https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf.

Howard J, Gugger S. Fastai: a layered API for deep learning. Information. 2020;11(2):108. https://doi.org/10.3390/info11020108.

He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016.

Zhang H, Cisse M, Dauphin YN, Lopez-Paz D. Mixup: beyond empirical risk minimization. International Conference on Learning Representations; 2017.

Smith LN. Cyclical learning rates for training neural networks. 2017 IEEE Winter Conference on Applications of Computer Vision (WACV); 2017:464-72. IEEE. https://doi.org/10.1109/WACV.2017.58.

Howard J, Ruder S. Universal language model fine-tuning for text classification. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne: Association for Computational Linguistics; 2018:328–39. https://www.aclweb.org/anthology/P18-1031, https://doi.org/10.18653/v1/P18-1031.

Selvaraju RR, Cogswell M, Das A, et al. Grad-Cam: visual explanations from deep networks via gradient-based localization. Proceedings of the IEEE international conference on computer vision; 2017.

R Core Team. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2013.

Wickham H, Averick M, Bryan J, Chang W, McGowan L, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen T, Miller E, Bache S, Müller K, Ooms J, Robinson D, Seidel D, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H. Welcome to the Tidyverse. J Open Source Softw. 2019;4(43):1686. https://doi.org/10.21105/joss.01686.

Sing T, Sander O, Beerenwinkel N, Lengauer T. Rocr: visualizing classifier performance in R. Bioinformatics. 2005;21(20):3940–1. https://doi.org/10.1093/bioinformatics/bti623.

Gamer M, Lemon J, Fellows I, Singh P. Irr: various coefficients of interrater reliability and agreement. R package version 0.84.1; 2010.

Liu X, Faes L, Kale AU, Wagner SK, Fu DJ, Bruynseels A, Mahendiran T, Moraes G, Shamdas M, Kern C, Ledsam JR, Schmid MK, Balaskas K, Topol EJ, Bachmann LM, Keane PA, Denniston AK. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health. 2019;1(6):e271–e97. https://doi.org/10.1016/S2589-7500(19)30123-2.

Kim DW, Jang HY, Kim KW, Shin Y, Park SH. Design characteristics of studies reporting the performance of artificial intelligence algorithms for diagnostic analysis of medical images: results from recently published papers. Korean J Radiol. 2019;20(3):405–10. https://doi.org/10.3348/kjr.2019.0025.

Yao AD, Cheng DL, Pan I, Kitamura F. Deep learning in neuroradiology: a systematic review of current algorithms and approaches for the new wave of imaging technology. Radiology. 2020;2(2):e190026.

Shin H-C, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM. Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging. 2016;35(5):1285–98. https://doi.org/10.1109/TMI.2016.2528162.

He T, Zhang Z, Zhang H, et al. Bag of tricks for image classification with convolutional neural networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019. p. 558–67.

Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016. p. 2818–26.