Deep Learning in Mammography

Investigative Radiology - Tập 52 Số 7 - Trang 434-440 - 2017
Anton S. Becker1,2,3,4, Magda Marcon2,3,4, Soleen Ghafoor2,3,4, Moritz C. Wurnig2,3,4, Thomas Frauenfelder2,3,4, Andreas Boss2,3,4
1Anton S. Becker, MD, Institute of Diagnostic and Interventional Radiology, University Hospital Zurich, Raemistrasse 100, 8091 Zurich, Switzerland. E-mail: [email protected].
2Conflicts of interest and sources of funding: none declared.
3Supplemental digital contents are available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal’s Web site (www.investigativeradiology.com).
4the Institute of Diagnostic and Interventional Radiology, University Hospital Zurich, Zurich, Switzerland.

Tóm tắt

Objectives The aim of this study was to evaluate the diagnostic accuracy of a multipurpose image analysis software based on deep learning with artificial neural networks for the detection of breast cancer in an independent, dual-center mammography data set. Materials and Methods In this retrospective, Health Insurance Portability and Accountability Act-compliant study, all patients undergoing mammography in 2012 at our institution were reviewed (n = 3228). All of their prior and follow-up mammographies from a time span of 7 years (2008–2015) were considered as a reference for clinical diagnosis. After applying exclusion criteria (missing reference standard, prior procedures or therapies), patients with the first diagnosis of a malignoma or borderline lesion were selected (n = 143). Histology or clinical long-term follow-up served as reference standard. In a first step, a breast density-and age-matched control cohort was selected (n = 143) from the remaining patients with more than 2 years follow-up (n = 1003). The neural network was trained with this data set. From the publicly available Breast Cancer Digital Repository data set, patients with cancer and a matched control cohort were selected (n = 35 × 2). The performance of the trained neural network was also tested with this external data set. Three radiologists (3, 5, and 10 years of experience) evaluated the test data set. In a second step, the neural network was trained with all cases from January to September and tested with cases from October to December 2012 (screening-like cohort). The radiologists also evaluated this second test data set. The areas under the receiver operating characteristic curve between readers and the neural network were compared. A Bonferroni-corrected P value of less than 0.016 was considered statistically significant. Results Mean age of patients with lesion was 59.6 years (range, 35–88 years) and in controls, 59.1 years (35–83 years). Breast density distribution (A/B/C/D) was 21/59/42/21 and 22/60/41/20, respectively. Histologic diagnoses were invasive ductal carcinoma in 90, ductal in situ carcinoma in 13, invasive lobular carcinoma in 13, mucinous carcinoma in 3, and borderline lesion in 12 patients. In the first step, the area under the receiver operating characteristic curve of the trained neural network was 0.81 and comparable on the test cases 0.79 (P = 0.63). One of the radiologists showed almost equal performance (0.83, P = 0.17), whereas 2 were significantly better (0.91 and 0.94, P < 0.016). In the second step, performance of the neural network (0.82) was not significantly different from the human performance (0.77–0.87, P > 0.016); however, radiologists were consistently less sensitive and more specific than the neural network. Conclusions Current state-of-the-art artificial neural networks for general image analysis are able to detect cancer in mammographies with similar accuracy to radiologists, even in a screening-like cohort with low breast cancer prevalence.

Từ khóa


Tài liệu tham khảo

2015, The changing world of breast cancer: a radiologist's perspective, Invest Radiol, 50, 615, 10.1097/RLI.0000000000000166

2005, Diagnostic performance of digital versus film mammography for breast-cancer screening, N Engl J Med, 353, 1773, 10.1056/NEJMoa052911

2011, Cumulative probability of false-positive recall or biopsy recommendation after 10 years of screening mammography: a cohort study, Ann Intern Med, 155, 481, 10.7326/0003-4819-155-8-201110180-00004

2010, False-positive results in the randomized controlled trial of mammographic screening from age 40 (“Age” trial), Cancer Epidemiol Biomarkers Prev, 19, 2758, 10.1158/1055-9965.EPI-10-0623

2013, Comparison of tomosynthesis plus digital mammography and digital mammography alone for breast cancer screening, Radiology, 269, 694, 10.1148/radiol.13130307

2016, Comparison of the detection rate of simulated microcalcifications in full-field digital mammography, digital breast tomosynthesis, and synthetically reconstructed 2-dimensional images performed with 2 different digital x-ray mammography systems, Invest Radiol

2010, Meta-analyses of the effect of false-positive mammograms on generic and specific psychosocial outcomes, Psychooncology, 19, 1026, 10.1002/pon.1676

2013, Long-term psychosocial consequences of false-positive screening mammography, Ann Fam Med, 11, 106, 10.1370/afm.1466

2013, ACR BI-RADS® Atlas, Breast imaging reporting and data system, J Am Coll Radiol, 39

2013, An evaluation of image descriptors combined with clinical data for breast cancer diagnosis, Int J Comput Assist Radiol Surg, 8, 561, 10.1007/s11548-013-0838-2

2012, Discovering mammography-based machine learning classifiers for breast cancer diagnosis, J Med Syst, 36, 2259, 10.1007/s10916-011-9693-2

2007, Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference, Political Analysis, 15, 199, 10.1093/pan/mpl013

2015, Deep learning, Nature, 521, 436, 10.1038/nature14539

2009, Learning deep architectures for AI, Foundations and Trends in Machine Learning, 2, 1, 10.1561/2200000006

2007, To recognize shapes, first learn to generate images, Prog Brain Res, 165, 535, 10.1016/S0079-6123(06)65034-6

2006, A fast learning algorithm for deep belief nets, Neural Comput, 18, 1527, 10.1162/neco.2006.18.7.1527

1979, Intraclass correlations: uses in assessing rater reliability, Psychol Bull, 86, 420, 10.1037/0033-2909.86.2.420

1981, Developing criteria for establishing interrater reliability of specific items: applications to assessment of adaptive behavior, Am J Ment Defic, 86, 127

1988, Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach, Biometrics, 44, 837, 10.2307/2531595

1997, Nonparametric analysis of clustered ROC curve data, Biometrics, 53, 567, 10.2307/2533958

2015, Probabilistic visual search for masses within mammography images using deep learning, Bioinformatics and Biomedicine (BIBM), 2015 IEEE International Conference, S1310

2016, Unsupervised deep learning applied to breast density segmentation and mammographic risk scoring, IEEE Trans Med Imaging, 35, 1322, 10.1109/TMI.2016.2532122

2015, Automated mass detection in mammograms using cascaded deep learning and random forests, Digital Image Computing: Techniques and Applications (DICTA), 2015 International Conference, 1

2016, A new method of detecting micro-calcification clusters in mammograms using contourlet transform and non-linking simplified PCNN, Comput Methods Programs Biomed, 130, 31, 10.1016/j.cmpb.2016.02.019

2011, Multicenter surveillance of women at high genetic breast cancer risk using mammography, ultrasonography, and contrast-enhanced magnetic resonance imaging (the high breast cancer risk Italian 1 study): final results, Invest Radiol, 46, 94, 10.1097/RLI.0b013e3181f3fcdf

2017, Towards localization of malignant sites of asymmetry across bilateral mammograms, Comput Methods Programs Biomed, 140, 11, 10.1016/j.cmpb.2016.11.010

2016, Prediction model for extensive ductal carcinoma in situ around early-stage invasive breast cancer, Invest Radiol, 51, 462, 10.1097/RLI.0000000000000255