Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy

BMJ, The - Trang n1872
Karoline Freeman1, Julia Geppert1, Chris Stinton1, Daniel Todkill1, Samantha Johnson1, Aileen Clarke1, Sian Taylor‐Phillips1
1Division of Health Sciences, University of Warwick, Coventry, UK

Tóm tắt

Abstract Objective To examine the accuracy of artificial intelligence (AI) for the detection of breast cancer in mammography screening practice. Design Systematic review of test accuracy studies. Data sources Medline, Embase, Web of Science, and Cochrane Database of Systematic Reviews from 1 January 2010 to 17 May 2021. Eligibility criteria Studies reporting test accuracy of AI algorithms, alone or in combination with radiologists, to detect cancer in women’s digital mammograms in screening practice, or in test sets. Reference standard was biopsy with histology or follow-up (for screen negative women). Outcomes included test accuracy and cancer type detected. Study selection and synthesis Two reviewers independently assessed articles for inclusion and assessed the methodological quality of included studies using the QUality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool. A single reviewer extracted data, which were checked by a second reviewer. Narrative data synthesis was performed. Results Twelve studies totalling 131 822 screened women were included. No prospective studies measuring test accuracy of AI in screening practice were found. Studies were of poor methodological quality. Three retrospective studies compared AI systems with the clinical decisions of the original radiologist, including 79 910 women, of whom 1878 had screen detected cancer or interval cancer within 12 months of screening. Thirty four (94%) of 36 AI systems evaluated in these studies were less accurate than a single radiologist, and all were less accurate than consensus of two or more radiologists. Five smaller studies (1086 women, 520 cancers) at high risk of bias and low generalisability to the clinical context reported that all five evaluated AI systems (as standalone to replace radiologist or as a reader aid) were more accurate than a single radiologist reading a test set in the laboratory. In three studies, AI used for triage screened out 53%, 45%, and 50% of women at low risk but also 10%, 4%, and 0% of cancers detected by radiologists. Conclusions Current evidence for AI does not yet allow judgement of its accuracy in breast cancer screening programmes, and it is unclear where on the clinical pathway AI might be of most benefit. AI systems are not sufficiently specific to replace radiologist double reading in screening programmes. Promising results in smaller studies are not replicated in larger studies. Prospective studies are required to measure the effect of AI in clinical practice. Such studies will require clear stopping rules to ensure that AI does not reduce programme specificity. Study registration Protocol registered as PROSPERO CRD42020213590.

Từ khóa


Tài liệu tham khảo

10.1001/jamaoncol.2016.5688

10.1016/j.canep.2012.02.007

10.1186/s13058-016-0705-5

10.1016/S0959-8049(03)00260-0

10.1158/1055-9965.EPI-17-0487

10.1148/radiology.160.2.3523590

10.1016/j.ejca.2014.03.017

10.1038/s41523-017-0014-x

AI Index Steering Committee . The AI Index 2021 Annual Report. Human-Centered AI Institute, Stanford University, 2021.

10.1016/S2589-7500(20)30019-4

10.1056/NEJMp1606181

10.1016/j.jacr.2016.07.010

10.1038/s41591-020-0776-9

10.1038/s41568-018-0016-5

10.1016/j.acra.2015.05.007

10.1186/s12885-017-3808-1

10.1038/538020a

10.1038/s42256-020-00257-z

10.1101/2020.09.13.20193565

10.1001/jama.2017.19163

10.1059/0003-4819-155-8-201110180-00009

10.1001/jamanetworkopen.2020.18179

Wickham H. ggplot2: Elegant Graphics for Data Analysis. New York: Springer-Verlag 2016.

R: A language and environment for statistical computing. [program] R Foundation for Statistical Computing, 2017.

Balta C Rodriguez-Ruiz A Mieskes C Karssemeijer N Heywang-Köbrunner SH . Going from double to single reading for screening exams labeled as likely normal by AI: what is the impact? Proc SPIE 2020;11513:115130D.

10.1016/S2589-7500(20)30185-0

10.1007/s00330-020-07165-1

10.1038/s41591-020-01174-9

10.1038/s41586-019-1799-6

10.1148/ryai.2020190208

10.1148/radiol.2021203555

10.1148/radiol.2018181371

10.1093/jnci/djy222

10.1117/12.2317937

10.1001/jamaoncol.2020.3321

10.1001/jamanetworkopen.2020.0265

10.1007/s10278-019-00192-5

10.1007/s00330-012-2562-7

10.1148/radiol.2491072025

10.1101/2021.02.26.21252537

10.1373/clinchem.2005.048595

10.1148/radiol.2017171920

10.1093/ckj/sfaa188

10.1016/j.jclinepi.2019.10.011

10.1017/S0266462312000086

10.1080/17434440.2019.1610387

10.1016/j.oret.2018.10.014

10.1038/s41433-019-0566-0

10.1147/JRD.2017.2708299

10.1016/j.oret.2016.12.009

10.1109/embc.2015.7318462

10.1109/TMI.2016.2525803

10.1016/j.acra.2018.02.018

10.1109/TSMC.2017.2705582

10.1016/j.ophtha.2016.11.014

10.1136/bjophthalmol-2020-316594

10.1016/j.breast.2019.10.001

10.1016/B978-0-12-818438-7.00012-5

NCT. Mammography Screening With Artificial Intelligence (MASAI). https://clinicaltrials.gov/ct2/show/NCT04838756. 2021.

NCT. Using AI to Select Women for Supplemental MRI in Breast Cancer Screening. https://clinicaltrials.gov/ct2/show/NCT04832594. 2021.

10.1186/s12885-017-3929-6

10.1038/bjc.2011.3

Screening and Immunisations Team. Breast Screening Programme: England, 2019-20: National Statistics; NHS Digital. 2021. https://files.digital.nhs.uk/F9/98C8E3/breast-screening-programme-eng-2019-20-report.pdf.

Breast Cancer Surveillance Consortium. Screening Mammography Sensitivity, Specificity & False Negative Rate. 2017. https://www.bcsc-research.org/statistics/screening-performance-benchmarks/screening-sens-spec-false-negative.