A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis

The Lancet Digital Health - Tập 1 - Trang e271-e297 - 2019
Xiaoxuan Liu1,2,3,4, Livia Faes3,5, Aditya U Kale1, Siegfried K Wagner6, Dun Jack Fu3, Alice Bruynseels1, Thushika Mahendiran1, Gabriella Moraes3, Mohith Shamdas2, Christoph Kern3,7, Joseph R Ledsam8, Martin K Schmid5, Konstantinos Balaskas3,6, Eric J Topol9, Lucas M Bachmann10, Pearse A Keane6,4, Alastair K Denniston1,2,11,6,4
1Department of Ophthalmology, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
2Academic Unit of Ophthalmology, Institute of Inflammation & Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
3Medical Retina Department, Moorfields Eye Hospital NHS Foundation Trust, London, UK
4Health Data Research UK, London, UK
5Eye Clinic, Cantonal Hospital of Lucerne, Lucerne, Switzerland
6NIHR Biomedical Research Centre for Ophthalmology, Moorfields Eye Hospital NHS Foundation Trust and UCL Institute of Ophthalmology, London, UK
7University Eye Hospital, Ludwig Maximilian University of Munich, Munich, Germany
8DeepMind, London, UK
9Scripps Research Translational Institute, La Jolla, California
10Medignition, Research Consultants, Zurich, Switzerland
11Centre for Patient Reported Outcome Research, Institute of Applied Health Research, University of Birmingham, Birmingham, UK

Tài liệu tham khảo

Fletcher, 1951, Matter with a mind; a neurological research robot, Research, 4, 305 Shoham, 2018 Krizhevsky, 2017, ImageNet classification with deep convolutional neural networks, Commun ACM, 60, 84, 10.1145/3065386 Litjens, 2017, A survey on deep learning in medical image analysis, Med Image Anal, 42, 60, 10.1016/j.media.2017.07.005 Vinyals, 2017, Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge, IEEE Trans Pattern Anal Mach Intell, 39, 652, 10.1109/TPAMI.2016.2587640 Hinton, 2012, Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups, IEEE Signal Process Mag, 29, 82, 10.1109/MSP.2012.2205597 Collobert, 2011, Natural language processing (almost) from scratch, J Mach Learn Res, 12, 2493 Hadsell R, Erkan A, Sermanet P, Scoffier M, Muller U, LeCun Y. Deep belief net learning in a long-range vision system for autonomous off-road driving. 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems; Nice, France; Sept 22–26, 2008: 628–33. Hadsell, 2009, Learning long-range vision for autonomous off-road driving, J Field Rob, 26, 120, 10.1002/rob.20276 Jha, 2016, Adapting to artificial intelligence: radiologists and pathologists as information specialists, JAMA, 316, 2353, 10.1001/jama.2016.17438 Darcy, 2016, Machine learning and the profession of medicine, JAMA, 315, 551, 10.1001/jama.2015.18421 Coiera, 2018, The fate of medicine in the time of AI, Lancet, 392, 2331, 10.1016/S0140-6736(18)31925-1 Zhang, 2018, Big data and medical research in China, BMJ, 360 Schlemmer, 2018, Global challenges for cancer imaging, J Glob Oncol, 4, 1 King, 2018, Artificial intelligence and radiology: what will the future hold?, J Am Coll Radiol, 15, 501, 10.1016/j.jacr.2017.11.017 Topol, 2019, High-performance medicine: the convergence of human and artificial intelligence, Nat Med, 25, 44, 10.1038/s41591-018-0300-7 Moher, 2010, Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement, Int J Surg, 8, 336, 10.1016/j.ijsu.2010.02.007 Moons, 2014, Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist, PLoS Med, 11, 10.1371/journal.pmed.1001744 Ardila, 2019, End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography, Nat Med, 25, 954, 10.1038/s41591-019-0447-x Becker, 2018, Classification of breast cancer in ultrasound imaging using a generic deep learning analysis software: a pilot study, Br J Radiol, 91 Brown, 2018, Automated diagnosis of plus disease in retinopathy of prematurity using deep convolutional neural networks, JAMA Ophthalmol, 136, 803, 10.1001/jamaophthalmol.2018.1934 Schlegl, 2018, Fully automated detection and quantification of macular fluid in OCT using deep learning, Ophthalmology, 125, 549, 10.1016/j.ophtha.2017.10.031 Harbord, 2008, An empirical comparison of methods for meta-analysis of diagnostic accuracy showed hierarchical models are necessary, J Clin Epidemiol, 61, 1095, 10.1016/j.jclinepi.2007.09.013 Abbasi-Sureshjani, 2018, Exploratory study on direct prediction of diabetes using deep residual networks, 797 Adams, 2019, Computer vs human: deep learning versus perceptual training for the detection of neck of femur fractures, J Med Imaging Radiat Oncol, 63, 27, 10.1111/1754-9485.12828 Ariji, 2019, Contrast-enhanced computed tomography image assessment of cervical lymph node metastasis in patients with oral cancer by using a deep learning system of artificial intelligence, Oral Surg Oral Med Oral Pathol Oral Radiol, 127, 458, 10.1016/j.oooo.2018.10.002 Ayed NGB, Masmoudi AD, Sellami D, Abid R. New developments in the diagnostic procedures to reduce prospective biopsies breast. 2015 International Conference on Advances in Biomedical Engineering (ICABME); Beirut, Lebanon; Sept 16–18, 2015: 205–08. Becker, 2017, Deep learning in mammography: diagnostic accuracy of a multipurpose image analysis software in the detection of breast cancer, Invest Radiol, 52, 434, 10.1097/RLI.0000000000000358 Bien, 2018, Deep-learning-assisted diagnosis for knee magnetic resonance imaging: development and retrospective validation of MRNet, PLoS Med, 15, 10.1371/journal.pmed.1002699 Brinker, 2019, A convolutional neural network trained with dermoscopic images performed on par with 145 dermatologists in a clinical melanoma image classification task, Eur J Cancer, 111, 148, 10.1016/j.ejca.2019.02.005 Burlina, 2017, Comparing humans and deep learning performance for grading AMD: a study in using universal deep features and transfer learning for automated AMD analysis, Comput Biol Med, 82, 80, 10.1016/j.compbiomed.2017.01.018 Burlina, 2018, Utility of deep learning methods for referability classification of age-related macular degeneration, JAMA Ophthalmol, 136, 1305, 10.1001/jamaophthalmol.2018.3799 Burlina, 2018, Use of deep learning for detailed severity characterization and estimation of 5-year risk among patients with age-related macular degeneration, JAMA Ophthalmol, 136, 1359, 10.1001/jamaophthalmol.2018.4118 Byra, 2019, Breast mass classification in sonography with transfer learning using a deep convolutional neural network and color conversion, Med Phys, 46, 746, 10.1002/mp.13361 Cao, 2019, Joint prostate cancer detection and gleason score prediction in mp-MRI via FocalNet, IEEE Trans Med Imaging, 10.1109/TMI.2019.2901928 Chee, 2019, Performance of a deep learning algorithm in detecting osteonecrosis of the femoral head on digital radiography: a comparison with assessments by radiologists, AJR Am J Roentgenol, 10.2214/AJR.18.20817 Choi, 2019, Effect of a deep learning framework-based computer-aided diagnosis system on the diagnostic performance of radiologists in differentiating between malignant and benign masses on breast ultrasonography, Korean J Radiol, 20, 749, 10.3348/kjr.2018.0530 Choi, 2018, Development and validation of a deep learning system for staging liver fibrosis by using contrast agent-enhanced CT images in the liver, Radiology, 289, 688, 10.1148/radiol.2018180763 Ciompi, 2017, Towards automatic pulmonary nodule management in lung cancer screening with deep learning, Sci Rep, 7 Codella, 2017, Deep learning ensembles for melanoma recognition in dermoscopy images, IBM J Res Dev, 61, 5, 10.1147/JRD.2017.2708299 Coudray, 2018, Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning, Nat Med, 24, 1559, 10.1038/s41591-018-0177-5 De Fauw, 2018, Clinically applicable deep learning for diagnosis and referral in retinal disease, Nat Med, 24, 1342, 10.1038/s41591-018-0107-6 Ding, 2019, A deep learning model to predict a diagnosis of Alzheimer disease by using 18F-FDG PET of the brain, Radiology, 290, 456, 10.1148/radiol.2018180958 Dunnmon, 2019, Assessment of convolutional neural networks for automated classification of chest radiographs, Radiology, 290, 537, 10.1148/radiol.2018181422 Ehteshami Bejnordi, 2017, Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer, JAMA, 318, 2199, 10.1001/jama.2017.14585 Esteva, 2017, Dermatologist-level classification of skin cancer with deep neural networks, Nature, 542, 115, 10.1038/nature21056 Fujioka, 2019, Distinction between benign and malignant breast masses at breast ultrasound using deep learning method with convolutional neural network, Jpn J Radiol, 37, 466, 10.1007/s11604-019-00831-5 Fujisawa, 2019, Deep-learning-based, computer-aided classifier developed with a small dataset of clinical images surpasses board-certified dermatologists in skin tumour diagnosis, Br J Dermatol, 180, 373, 10.1111/bjd.16924 Gómez-Valverde, 2019, Automatic glaucoma classification using color fundus images based on convolutional neural networks and transfer learning, Biomed Opt Express, 10, 892, 10.1364/BOE.10.000892 Grewal M, Srivastava MM, Kumar P, Varadarajan S. RADnet: radiologist level accuracy using deep learning for hemorrhage detection in CT scans. 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018); Washington, DC, USA; April 4–7, 2018: 281–84. Haenssle, 2018, Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists, Ann Oncol, 29, 1836, 10.1093/annonc/mdy166 Hamm, 2019, Deep learning for liver tumor diagnosis part I: development of a convolutional neural network classifier for multi-phasic MRI, Eur Radiol, 29, 3338, 10.1007/s00330-019-06205-9 Han, 2018, Classification of the clinical images for benign and malignant cutaneous tumors using a deep learning algorithm, J Invest Dermatol, 138, 1529, 10.1016/j.jid.2018.01.028 Han, 2018, Deep neural networks show an equivalent and often superior performance to dermatologists in onychomycosis diagnosis: automatic construction of onychomycosis datasets by region-based convolutional deep neural network, PLoS One, 13 Hwang, 2018, Development and validation of a deep learning–based automatic detection algorithm for active pulmonary tuberculosis on chest radiographs, Clin Infect Dis Hwang, 2019, Artificial intelligence-based decision-making for age-related macular degeneration, Theranostics, 9, 232, 10.7150/thno.28447 Hwang, 2019, Development and validation of a deep learning–based automated detection algorithm for major thoracic diseases on chest radiographs, JAMA Network Open, 2, 10.1001/jamanetworkopen.2019.1095 Kermany, 2018, Identifying medical diagnoses and treatable diseases by image-based deep learning, Cell, 172, 1122, 10.1016/j.cell.2018.02.010 Kim, 2012, A comparison of logistic regression analysis and an artificial neural network using the BI-RADS lexicon for ultrasonography in conjunction with introbserver variability, J Digit Imaging, 25, 599, 10.1007/s10278-012-9457-7 Kim, 2018, Performance of the deep convolutional neural network based magnetic resonance image scoring algorithm for differentiating between tuberculous and pyogenic spondylitis, Sci Rep, 8 Kim, 2019, Deep learning in diagnosis of maxillary sinusitis using conventional radiography, Invest Radiol, 54, 7, 10.1097/RLI.0000000000000503 Kise, 2019, Preliminary study on the application of deep learning system to diagnosis of Sjögren's syndrome on CT images, Dentomaxillofac Radiol, 10.1259/dmfr.20190019 Ko, 2019, Deep convolutional neural network for the diagnosis of thyroid nodules on ultrasound, Head Neck, 41, 885, 10.1002/hed.25415 Kumagai, 2019, Diagnosis using deep-learning artificial intelligence based on the endocytoscopic observation of the esophagus, Esophagus, 16, 180, 10.1007/s10388-018-0651-7 Lee, 2019, An explainable deep-learning algorithm for the detection of acute intracranial haemorrhage from small datasets, Nat Biomed Eng, 3, 173, 10.1038/s41551-018-0324-9 Li, 2018, Development and validation of an endoscopic images-based deep learning model for detection with nasopharyngeal malignancies, Cancer Commun, 38, 59, 10.1186/s40880-018-0325-9 Li, 2019, Diagnosis of thyroid cancer using deep convolutional neural network models applied to sonographic images: a retrospective, multicohort, diagnostic study, Lancet Oncol, 20, 193, 10.1016/S1470-2045(18)30762-9 Lin, 2014, Breast nodules computer-aided diagnostic system design using fuzzy cerebellar model neural networks, IEEE Trans Fuzzy Syst, 22, 693, 10.1109/TFUZZ.2013.2269149 Lindsey, 2018, Deep neural network improves fracture detection by clinicians, Proc Natl Acad Sci USA, 115, 11591, 10.1073/pnas.1806905115 Long, 2017, An artificial intelligence platform for the multihospital collaborative management of congenital cataracts, Nat Biomed Eng, 1, 0024, 10.1038/s41551-016-0024 Lu, 2018, Deep learning-based automated classification of multi-categorical abnormalities from optical coherence tomography images, Transl Vis Sci Technol, 7, 41, 10.1167/tvst.7.6.41 Matsuba, 2019, Accuracy of ultra-wide-field fundus ophthalmoscopy-assisted deep learning, a machine-learning technology, for detecting age-related macular degeneration, Int Ophthalmol, 39, 1269, 10.1007/s10792-018-0940-0 Nakagawa, 2019, Classification for invasion depth of esophageal squamous cell carcinoma using a deep neural network compared with experienced endoscopists, Gastrointestinal Endoscopy, 10.1016/j.gie.2019.04.245 Nam, 2019, Development and validation of deep learning–based automatic detection algorithm for malignant pulmonary nodules on chest radiographs, Radiology, 290, 218, 10.1148/radiol.2018180237 Olczak, 2017, Artificial intelligence for analyzing orthopedic trauma radiographs, Acta Orthopaedica, 88, 581, 10.1080/17453674.2017.1344459 Peng, 2019, DeepSeeNet: a deep learning model for automated classification of patient-based age-related macular degeneration severity from color fundus photographs, Ophthalmology, 126, 565, 10.1016/j.ophtha.2018.11.015 Poedjiastoeti, 2018, Application of convolutional neural network in the diagnosis of jaw tumors, Healthc Inform Res, 24, 236, 10.4258/hir.2018.24.3.236 Rajpurkar, 2018, Deep learning for chest radiograph diagnosis: a retrospective comparison of the CheXNeXt algorithm to practicing radiologists, PLoS Med, 15, 10.1371/journal.pmed.1002686 Ruamviboonsuk, 2019, Deep learning versus human graders for classifying diabetic retinopathy severity in a nationwide screening program, NPJ Digit Med, 2, 25, 10.1038/s41746-019-0099-8 Sayres, 2019, Using a deep learning algorithm and integrated gradients explanation to assist grading for diabetic retinopathy, Ophthalmology, 126, 552, 10.1016/j.ophtha.2018.11.016 Shibutani, 2019, Accuracy of an artificial neural network for detecting a regional abnormality in myocardial perfusion SPECT, Ann Nucl Med, 33, 86, 10.1007/s12149-018-1306-4 Shichijo, 2017, Application of convolutional neural networks in the diagnosis of Helicobacter pylori infection based on endoscopic images, EbioMedicine, 25, 106, 10.1016/j.ebiom.2017.10.014 Singh, 2018, Deep learning in chest radiography: detection of findings and presence of change, PLoS One, 13, 10.1371/journal.pone.0204155 Song, 2019, Multitask cascade convolution neural networks for automatic thyroid nodule detection and recognition, IEEE J Biomed Health Inform, 23, 1215, 10.1109/JBHI.2018.2852718 Stoffel, 2018, Distinction between phyllodes tumor and fibroadenoma in breast ultrasound using deep learning image analysis, Eur J Radiol Open, 5, 165, 10.1016/j.ejro.2018.09.002 Streba, 2012, Contrast-enhanced ultrasonography parameters in neural network diagnosis of liver tumors, World J Gastroenterol, 18, 4427, 10.3748/wjg.v18.i32.4427 Sun, 2014, A computer-aided diagnostic algorithm improves the accuracy of transesophageal echocardiography for left atrial thrombi: a single-center prospective study, J Ultrasound Med, 33, 83, 10.7863/ultra.33.1.83 Tschandl, 2019, Expert-level diagnosis of nonpigmented skin cancer by combined convolutional neural networks, JAMA Dermatol, 155, 58, 10.1001/jamadermatol.2018.4378 Urakawa, 2019, Detecting intertrochanteric hip fractures with orthopedist-level accuracy using a deep convolutional neural network, Skeletal Radiol, 48, 239, 10.1007/s00256-018-3016-3 van Grinsven, 2016, Fast convolutional neural network training using selective data sampling: application to hemorrhage detection in color fundus images, IEEE Trans Med Imaging, 35, 1273, 10.1109/TMI.2016.2526689 Walsh, 2018, Deep learning for classifying fibrotic lung disease on high-resolution computed tomography: a case-cohort study, Lancet Respir Med, 6, 837, 10.1016/S2213-2600(18)30286-8 Wang, 2017, Comparison of machine learning methods for classifying mediastinal lymph node metastasis of non-small cell lung cancer from F-18-FDG PET/CT images, EJNMMI Res, 7, 11, 10.1186/s13550-017-0260-9 Wang, 2018, 3D convolutional neural network for differentiating pre-invasive lesions from invasive adenocarcinomas appearing as ground-glass nodules with diameters ≤3 cm using HRCT, Quant Imaging Med Surg, 8, 491, 10.21037/qims.2018.06.03 Wang, 2019, Automatic thyroid nodule recognition and diagnosis in ultrasound imaging with the YOLOv2 neural network, World J Surg Oncol, 17, 12, 10.1186/s12957-019-1558-z Wright, 2014, Automatic classification of DMSA scans using an artificial neural network, Phys Med Biol, 59, 1789, 10.1088/0031-9155/59/7/1789 Wu, 2019, A deep neural network improves endoscopic detection of early gastric cancer without blind spots, Endoscopy, 51, 522, 10.1055/a-0855-3532 Ye, 2019, Precise diagnosis of intracranial hemorrhage and subtypes using a three-dimensional joint convolutional and recurrent neural network, Eur Radiol, 10.1007/s00330-019-06163-2 Yu, 2018, Acral melanoma detection using a convolutional neural network for dermoscopy images, PLoS One, 13 Zhang, 2019, Toward an expert level of lung cancer detection and classification using a deep convolutional neural network, Oncologist, 10.1634/theoncologist.2018-0908 Zhang, 2019, Development of an automated screening system for retinopathy of prematurity using a deep neural network for wide-angle retinal images, IEEE Access, 7, 10232, 10.1109/ACCESS.2018.2881042 Zhao, 2018, 3D deep learning from CT scans predicts tumor invasiveness of subcentimeter pulmonary adenocarcinomas, Cancer Res, 78, 6881, 10.1158/0008-5472.CAN-18-0696 Riet, 2013, Individual patient data meta-analysis of diagnostic studies: opportunities and challenges, Evid Based Med, 18, 165, 10.1136/eb-2012-101145 Altman, 2000, What do we mean by validating a prognostic model?, Stat Med, 19, 453, 10.1002/(SICI)1097-0258(20000229)19:4<453::AID-SIM350>3.0.CO;2-5 Luo, 2016, Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view, J Med Internet Res, 18, e323, 10.2196/jmir.5870 Collins, 2019, Reporting of artificial intelligence prediction models, Lancet, 393, 1577, 10.1016/S0140-6736(19)30037-6 Steyerberg, 2014, Towards better clinical prediction models: seven steps for development and an ABCD for validation, Eur Heart J, 35, 1925, 10.1093/eurheartj/ehu207 Harrell, 1984, Regression modelling strategies for improved prognostic prediction, Stat Med, 3, 143, 10.1002/sim.4780030207 Koh, 2017, Understanding black-box predictions via influence functions, Proc Mach Learn Res, 70, 1885 Zech, 2018, Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study, PLoS Med, 15, 10.1371/journal.pmed.1002683 Badgeley, 2019, Deep learning predicts hip fracture using confounding patient and healthcare variables, NPJ Digit Med, 2, 31, 10.1038/s41746-019-0105-1 Bachmann, 2009, Multivariable adjustments counteract spectrum and test review bias in accuracy studies, J Clin Epidemiol, 62, 357, 10.1016/j.jclinepi.2008.02.007 Ferrante di Ruffano, 2012, Assessing the value of diagnostic tests: a framework for designing and evaluating trials, BMJ, 344, e686, 10.1136/bmj.e686 Liu X, Faes, L, Calvert MJ, Denniston AK, CONSORT-AI/SPIRIT-AI Extension Group. Extension of the CONSORT and SPIRIT statements. Lancet (in press). The CONSORT-AI and SPIRIT-AI Steering Group. Reporting guidelines for clinical trials evaluating artificial intelligence interventions are needed. Nat Med (in press).