Những thách thức chính trong ứng dụng ảnh hưởng lâm sàng của trí tuệ nhân tạo

BMC Medicine - Tập 17 Số 1 - 2019
Christopher Kelly1, Alan Karthikesalingam1, Mustafa Suleyman2, Greg S. Corrado3, Dominic King1
1Google Health, London, UK
2DeepMind, London, UK
3Google Health, California, USA

Tóm tắt

Tóm tắt Phần giới thiệu

Nghiên cứu trí tuệ nhân tạo (AI) trong chăm sóc sức khỏe đang tăng tốc nhanh chóng, với các ứng dụng tiềm năng được minh chứng trong nhiều lĩnh vực y học. Tuy nhiên, hiện nay chỉ có một số ít ví dụ thành công về những kỹ thuật này được triển khai vào thực tiễn lâm sàng. Bài báo này khám phá những thách thức và hạn chế chính của AI trong chăm sóc sức khỏe và xem xét các bước cần thiết để chuyển đổi các công nghệ có thể biến đổi này từ nghiên cứu sang thực tế lâm sàng.

Nội dung chính

Những thách thức chính cho việc chuyển giao các hệ thống AI trong chăm sóc sức khỏe bao gồm những thách thức nội tại của khoa học học máy, khó khăn về mặt logistics trong việc thực hiện và cân nhắc đến rào cản áp dụng cũng như những thay đổi cần thiết về văn hóa xã hội hay quy trình. Đánh giá lâm sàng chặt chẽ qua các thử nghiệm ngẫu nhiên đối chứng nên được xem là tiêu chuẩn vàng để tạo ra bằng chứng, nhưng thực hiện những điều này trong thực tế có thể không phải lúc nào cũng phù hợp hoặc khả thi. Các chỉ số hiệu suất nên nhằm mục tiêu nắm bắt được tính ứng dụng lâm sàng thực sự và dễ hiểu đối với người dùng dự kiến. Quy định cân bằng giữa tốc độ đổi mới và khả năng gây hại cùng với sự giám sát sau thị trường chu đáo là rất cần thiết để đảm bảo rằng bệnh nhân không bị phơi nhiễm với các can thiệp nguy hiểm cũng như không bị thiếu cơ hội tiếp cận với các đổi mới có lợi. Cần phát triển các cơ chế để so sánh trực tiếp các hệ thống AI, bao gồm sử dụng các bộ thử nghiệm độc lập, địa phương và đại diện. Các nhà phát triển thuật toán AI cần phải cảnh giác với các nguy cơ tiềm ẩn, bao gồm sự thay đổi trong tập dữ liệu, việc vô tình gán khớp những yếu tố gây nhiễu, sự thiên vị phân biệt không mong muốn, thách thức của sự tổng quát hóa cho các dân số mới, và các hậu quả tiêu cực không mong muốn của các thuật toán mới đối với kết quả sức khỏe.

Kết luận

Việc chuyển đổi an toàn và kịp thời từ nghiên cứu AI sang các hệ thống đã được xác nhận lâm sàng và điều tiết một cách thích hợp, có thể mang lại lợi ích cho mọi người, đang đối mặt với nhiều thách thức. Đánh giá lâm sàng mạnh mẽ, sử dụng các chỉ số dễ dàng tiếp cận với các bác sĩ lâm sàng và lý tưởng vượt ra ngoài các biện pháp để bao gồm chất lượng chăm sóc và kết quả của bệnh nhân, là rất cần thiết. Cần thực hiện thêm công việc để (1) xác định các chủ đề về thiên vị và thiếu công bằng trong thuật toán trong khi phát triển các giải pháp để giải quyết chúng, (2) giảm sự mỏng manh và cải thiện khả năng tổng quát hóa, và (3) phát triển các phương pháp cải tiến khả năng giải thích của dự đoán học máy. Nếu đạt được những mục tiêu này, lợi ích cho bệnh nhân chắc chắn sẽ mang tính cách mạng.

Từ khóa

#trí tuệ nhân tạo #chăm sóc sức khỏe #chuyển giao công nghệ #thách thức lâm sàng #đánh giá đồng cấp #thiên vị thuật toán

Tài liệu tham khảo

Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25:44–56.

Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, et al. A guide to deep learning in healthcare. Nat Med. 2019;25:24–9.

Berwick DM, Nolan TW, Whittington J. The triple aim: care, health, and cost. Health Aff. 2008;27:759–69. https://doi.org/10.1377/hlthaff.27.3.759 .

Bodenheimer T, Sinsky C. From triple to quadruple aim: care of the patient requires care of the provider. Ann Fam Med. 2014;12:573–6.

Hwang EJ, Park S, Jin K-N, Kim JI, Choi SY, Lee JH, et al. Development and validation of a deep learning-based automated detection algorithm for major thoracic diseases on chest radiographs. JAMA Netw Open. 2019;2:e191095.

Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017. https://doi.org/10.1109/cvpr.2017.369 .

Li Z, Wang C, Han M, Xue Y, Wei W, Li L-J, et al. Thoracic Disease Identification and Localization with Limited Supervision. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. p. 2018. https://doi.org/10.1109/cvpr.2018.00865 .

Singh R, Kalra MK, Nitiwarangkul C, Patti JA, Homayounieh F, Padole A, et al. Deep learning in chest radiography: detection of findings and presence of change. PLoS One. 2018;13:e0204155. https://doi.org/10.1371/journal.pone.0204155 .

Nam JG, Park S, Hwang EJ, Lee JH, Jin K-N, Lim KY, et al. Development and validation of deep learning–based automatic detection algorithm for malignant pulmonary nodules on chest radiographs. Radiology. 2019;290:218–28. https://doi.org/10.1148/radiol.2018180237 .

Geras KJ, Wolfson S, Shen Y, Wu N, Gene Kim S, Kim E, et al. High-resolution breast cancer screening with multi-view deep convolutional neural networks. arXiv. 2017; https://arxiv.org/abs/1703.07047 . Accessed 1 May 2019.

Wu N, Phang J, Park J, Shen Y, Huang Z, Zorin M, et al. Deep neural networks improve radiologists’ performance in breast cancer screening. arXiv. 2019; https://arxiv.org/abs/1903.08297 . Accessed 1 May 2019.

Hua K-L, Hsu C-H, Hidayati SC, Cheng W-H, Chen Y-J. Computer-aided classification of lung nodules on computed tomography images via deep learning technique. Onco Targets Ther. 2015;8:2015–22.

Yasaka K, Akai H, Abe O, Kiryu S. Deep learning with convolutional neural network for differentiation of liver masses at dynamic contrast-enhanced CT: a preliminary study. Radiology. 2018;286:887–96. https://doi.org/10.1148/radiol.2017170706 .

Chilamkurthy S, Ghosh R, Tanamala S, Biviji M, Campeau NG, Venugopal VK, et al. Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study. Lancet. 2018;392:2388–96.

Shadmi R, Mazo V, Bregman-Amitai O, Elnekave E. Fully-convolutional deep-learning based system for coronary calcium score prediction from non-contrast chest CT. 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018); 2018. https://doi.org/10.1109/isbi.2018.8363515 .

Kamnitsas K, Ferrante E, Parisot S, Ledig C, Nori AV, Criminisi A, et al. DeepMedic for brain tumor segmentation. In: International Workshop on Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries; 2016. p. 38–49. https://doi.org/10.1007/978-3-319-55524-9_14 .

Ding Y, Sohn JH, Kawczynski MG, Trivedi H, Harnish R, Jenkins NW, et al. A deep learning model to predict a diagnosis of Alzheimer disease by using F-FDG PET of the brain. Radiology. 2019;290:456–64.

Chang HY, Jung CK, Woo JI, Lee S, Cho J, Kim SW, et al. Artificial intelligence in pathology. J Pathol Transl Med. 2019;53:1–12.

Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542:115–8.

Haenssle HA, Fink C, Schneiderbauer R, Toberer F, Buhl T, Blum A, et al. Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann Oncol. 2018;29:1836–42.

Han SS, Kim MS, Lim W, Park GH, Park I, Chang SE. Classification of the clinical images for benign and malignant cutaneous tumors using a deep learning algorithm. J Invest Dermatol. 2018;138:1529–38.

Brinker TJ, Hekler A, Enk AH, Klode J, Hauschild A, Berking C, et al. Deep learning outperformed 136 of 157 dermatologists in a head-to-head dermoscopic melanoma image classification task. Eur J Cancer. 2019;113:47–54.

Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316:2402–10.

De Fauw J, Ledsam JR, Romera-Paredes B, Nikolov S, Tomasev N, Blackwell S, et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med. 2018;24:1342–50.

Hannun AY, Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, et al. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nat Med. 2019;25:65–9.

Attia ZI, Noseworthy PA, Lopez-Jimenez F, Asirvatham SJ, Deshmukh AJ, Gersh BJ, et al. An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction. Lancet. 2019;394(10201):861–7. https://doi.org/10.1016/S0140-6736(19)31721-0 .

Galloway CD, Valys AV, Shreibati JB, Treiman DL, Petterson FL, Gundotra VP, et al. Development and validation of a deep-learning model to screen for hyperkalemia from the electrocardiogram. JAMA Cardiol. 2019;4(5):428–36. https://doi.org/10.1001/jamacardio.2019.0640 .

Wang P, Xiao X, Glissen Brown JR, Berzin TM, Tu M, Xiong F, et al. Development and validation of a deep-learning algorithm for the detection of polyps during colonoscopy. Nat Biomed Eng. 2018;2:741–8. https://doi.org/10.1038/s41551-018-0301-3 .

Xu J, Yang P, Xue S, Sharma B, Sanchez-Martin M, Wang F, et al. Translating cancer genomics into precision medicine with artificial intelligence: applications, challenges and future perspectives. Hum Genet. 2019;138:109–24.

Gurovich Y, Hanani Y, Bar O, Nadav G, Fleischer N, Gelbman D, et al. Identifying facial phenotypes of genetic disorders using deep learning. Nat Med. 2019;25:60–4.

Khosravi P, Kazemi E, Zhan Q, Malmsten JE, Toschi M, Zisimopoulos P, et al. Deep learning enables robust assessment and selection of human blastocysts after in vitro fertilization. NPJ Digit Med. 2019;2:21. https://doi.org/10.1038/s41746-019-0096-y .

Liang H, Tsui BY, Ni H, Valentim CCS, Baxter SL, Liu G, et al. Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence. Nat Med. 2019;25:433–8.

Escobar GJ, Turk BJ, Ragins A, Ha J, Hoberman B, LeVine SM, et al. Piloting electronic medical record-based early detection of inpatient deterioration in community hospitals. J Hosp Med. 2016;11(Suppl 1):S18–24.

Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, et al. Scalable and accurate deep learning with electronic health records. NPJ Digit Med. 2018;1:18. https://doi.org/10.1038/s41746-018-0029-1 .

Tomašev N, Glorot X, Rae JW, Zielinski M, Askham H, Saraiva A, et al. A clinically applicable approach to continuous prediction of future acute kidney injury. Nature. 2019;572:116–9.

Prasad N, Cheng L-F, Chivers C, Draugelis M, Engelhardt BE. A reinforcement learning approach to weaning of mechanical ventilation in intensive care units. arXiv. 2017; https://arxiv.org/abs/1704.06300 . Accessed 1 May 2019.

Raghu A, Komorowski M, Ahmed I, Celi L, Szolovits P, Ghassemi M. Deep reinforcement learning for sepsis treatment. arXiv. 2017; https://arxiv.org/abs/1711.09602 . Accessed 1 May 2019.

Gottesman O, Johansson F, Meier J, Dent J, Lee D, Srinivasan S, et al. Evaluating reinforcement learning algorithms in observational health settings. arXiv. 2018; https://arxiv.org/abs/1805.12298 . Accessed 1 May 2019.

Kannan A, Chen K, Jaunzeikare D, Rajkomar A. Semi-supervised learning for information extraction from dialogue. Interspeech. 2018;2018:2077–81. https://doi.org/10.21437/interspeech.2018-1318 .

Chiu C-C, Tripathi A, Chou K, Co C, Jaitly N, Jaunzeikare D, et al. Speech recognition for medical conversations. arXiv. 2017; https://arxiv.org/abs/1711.07274 . Accessed 1 May 2019.

Nelson A, Herron D, Rees G, Nachev P. Predicting scheduled hospital attendance with artificial intelligence. NPJ Digit Med. 2019;2:26. https://doi.org/10.1038/s41746-019-0103-3 .

Rajkomar A, Kannan A, Chen K, Vardoulakis L, Chou K, Cui C, et al. Automatically charting symptoms from patient-physician conversations using machine learning. JAMA Intern Med. 2019;179(6):836–8. https://doi.org/10.1001/jamainternmed.2018.8558 .

McGlynn EA, McDonald KM, Cassel CK. Measurement is essential for improving diagnosis and reducing diagnostic error: a report from the institute of medicine. JAMA. 2015;314:2501–2.

Beck AH, Sangoi AR, Leung S, Marinelli RJ, Nielsen TO, van de Vijver MJ, et al. Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Sci Transl Med. 2011;3:108ra113.

Poplin R, Varadarajan AV, Blumer K, Liu Y, McConnell MV, Corrado GS, et al. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat Biomed Eng. 2018;2:158–64.

Zarins CK, Taylor CA, Min JK. Computed fractional flow reserve (FFTCT) derived from coronary CT angiography. J Cardiovasc Transl Res. 2013;6:708–14. https://doi.org/10.1007/s12265-013-9498-4 .

Mutlu U, Colijn JM, Ikram MA, Bonnemaijer PWM, Licher S, Wolters FJ, et al. Association of retinal neurodegeneration on optical coherence tomography with dementia: a population-based study. JAMA Neurol. 2018;75:1256–63.

Abràmoff MD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med. 2018;1:39. https://doi.org/10.1038/s41746-018-0040-6 .

Kanagasingam Y, Xiao D, Vignarajan J, Preetham A, Tay-Kearney M-L, Mehrotra A. Evaluation of artificial intelligence-based grading of diabetic retinopathy in primary care. JAMA Netw Open. 2018;1:e182665. https://doi.org/10.1001/jamanetworkopen.2018.2665 .

Bellemo V, Lim ZW, Lim G, Nguyen QD, Xie Y, Yip MYT, et al. Artificial intelligence using deep learning to screen for referable and vision-threatening diabetic retinopathy in Africa: a clinical validation study. Lancet Digit Health. 2019;1:e35–44.

Liu Y, Kohlberger T, Norouzi M, Dahl GE, Smith JL, Mohtashamian A, et al. Artificial intelligence-based breast cancer nodal metastasis detection: insights into the black box for pathologists. Arch Pathol Lab Med. 2018;143(7):859–68. https://doi.org/10.5858/arpa.2018-0147-oa .

Steiner DF, MacDonald R, Liu Y, Truszkowski P, Hipp JD, Gammage C, et al. Impact of deep learning assistance on the histopathologic review of lymph nodes for metastatic breast cancer. Am J Surg Pathol. 2018;42:1636–46.

Lindsey R, Daluiski A, Chopra S, Lachapelle A, Mozer M, Sicular S, et al. Deep neural network improves fracture detection by clinicians. Proc Natl Acad Sci U S A. 2018;115:11591–6.

Mori Y, Kudo S-E, Misawa M, Saito Y, Ikematsu H, Hotta K, et al. Real-time use of artificial intelligence in identification of diminutive polyps during colonoscopy. Ann Intern Med. 2018;169:357. https://doi.org/10.7326/m18-0249 .

Long E, Lin H, Liu Z, Wu X, Wang L, Jiang J, et al. An artificial intelligence platform for the multihospital collaborative management of congenital cataracts. Nat Biomed Eng. 2017;1:0024. https://doi.org/10.1038/s41551-016-0024 .

Turakhia MP, Desai M, Hedlin H, Rajmane A, Talati N, Ferris T, et al. Rationale and design of a large-scale, app-based study to identify cardiac arrhythmias using a smartwatch: The Apple Heart Study. Am Heart J. 2019;207:66–75.

Lin H, Li R, Liu Z, Chen J, Yang Y, Chen H, et al. Diagnostic efficacy and therapeutic decision-making capacity of an artificial intelligence platform for childhood cataracts in eye clinics: a multicentre randomized controlled trial. EClinicalMedicine. 2019;9:52–9. https://doi.org/10.1016/j.eclinm.2019.03.001 .

Wu L, Zhang J, Zhou W, An P, Shen L, Liu J, et al. Randomised controlled trial of WISENSE, a real-time quality improving system for monitoring blind spots during esophagogastroduodenoscopy. Gut. 2019. https://doi.org/10.1136/gutjnl-2018-317366 .

Wang P, Berzin TM, Brown JRG, Bharadwaj S, Becq A, Xiao X, et al. Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study. Gut. 2019;68(10):1813–9. https://doi.org/10.1136/gutjnl-2018-317500 .

Titano JJ, Badgeley M, Schefflein J, Pain M, Su A, Cai M, et al. Automated deep-neural-network surveillance of cranial images for acute neurologic events. Nat Med. 2018;24:1337–41.

Brocklehurst P, Field D, Greene K, Juszczak E, Keith R, Kenyon S, et al. Computerised interpretation of fetal heart rate during labour (INFANT): a randomised controlled trial. Lancet. 2017;389:1719–29. https://doi.org/10.1016/s0140-6736(17)30568-8 .

Craig P, Dieppe P, Macintyre S, Michie S, Nazareth I, Petticrew M. Developing and evaluating complex interventions: an introduction to the new Medical Research Council guidance. In: Evidence-based Public Health: Effectiveness and Efficiency; 2009. p. 185–202. https://doi.org/10.1093/acprof:oso/9780199563623.003.012 .

Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD). Circulation. 2015;131:211–9. https://doi.org/10.1161/circulationaha.114.014508 .

Collins GS, Moons KGM. Reporting of artificial intelligence prediction models. Lancet. 2019;393:1577–9.

Keane PA, Topol EJ. With an eye to AI and autonomous diagnosis. NPJ Digit Med. 2018;1:40. https://doi.org/10.1038/s41746-018-0048-y .

Saito T, Rehmsmeier M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One. 2015;10:e0118432. https://doi.org/10.1371/journal.pone.0118432 .

Shah NH, Milstein A, Bagley PhD SC. Making machine learning models clinically useful. JAMA. 2019. https://doi.org/10.1001/jama.2019.10306 .

Vickers AJ, Cronin AM, Elkin EB, Gonen M. Extensions to decision curve analysis, a novel method for evaluating diagnostic tests, prediction models and molecular markers. BMC Med Inform Decis Mak. 2008;8:53.

Marcus G. Deep learning: a critical appraisal. arXiv. 2018; https://arxiv.org/abs/1801.00631 . Accessed 1 May 2019.

Nestor B, McDermott MBA, Chauhan G, Naumann T, Hughes MC, Goldenberg A, et al. Rethinking clinical prediction: why machine learning must consider year of care and feature aggregation. In: Machine Learning for Health (ML4H): NeurIPS; 2018. https://arxiv.org/abs/1811.12583 . Accessed 1 May 2019.

Davis SE, Greevy RA, Fonnesbeck C, Lasko TA, Walsh CG, Matheny ME. A nonparametric updating method to correct clinical prediction model drift. J Am Med Inform Assoc. 2019. https://doi.org/10.1093/jamia/ocz127 .

Ribeiro M, Singh S, Guestrin C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations; 2016. https://doi.org/10.18653/v1/n16-3020 .

Winkler JK, Fink C, Toberer F, Enk A, Deinlein T, Hofmann-Wellenhof R, et al. Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition. JAMA Dermatol. 2019. https://doi.org/10.1001/jamadermatol.2019.1735 .

Badgeley MA, Zech JR, Oakden-Rayner L, Glicksberg BS, Liu M, Gale W, et al. Deep learning predicts hip fracture using confounding patient and healthcare variables. arXiv. 2018; https://arxiv.org/abs/1811.03695 . Accessed 1 May 2019.

Zech JR, Badgeley MA, Liu M, Costa AB, Titano JJ, Oermann EK. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med. 2018;15:e1002683.

Debray TPA, Vergouwe Y, Koffijberg H, Nieboer D, Steyerberg EW, Moons KGM. A new framework to enhance the interpretation of external validation studies of clinical prediction models. J Clin Epidemiol. 2015;68:279–89.

Kim DW, Jang HY, Kim KW, Shin Y, Park SH. Design characteristics of studies reporting the performance of artificial intelligence algorithms for diagnostic analysis of medical images: results from recently published papers. Korean J Radiol. 2019;20:405–10.

Crawford K, Calo R. There is a blind spot in AI research. Nature. 2016;538:311–3.

Barocas S, Selbst AD. Big Data’s Disparate Impact. 104 California Law Review 671; 2016. https://doi.org/10.2139/ssrn.2477899 .

Chen IY, Johansson FD, Sontag D. Why Is My Classifier Discriminatory? In: 32nd Conference on Neural Information Processing Systems (NeurIPS). 2018. http://papers.nips.cc/paper/7613-why-is-my-classifier-discriminatory.pdf .

Haenssle HA, Fink C, Rosenberger A, Uhlmann L. Reply to “Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists” by H. A. Haenssle et al. Ann Oncol. 2019. https://doi.org/10.1093/annonc/mdz015 .

Ward-Peterson M, Acuña JM, Alkhalifah MK, Nasiri AM, Al-Akeel ES, Alkhaldi TM, et al. Association between race/ethnicity and survival of melanoma patients in the United States over 3 decades. Medicine. 2016;95:e3315. https://doi.org/10.1097/md.0000000000003315 .

Finlayson SG, Bowers JD, Ito J, Zittrain JL, Beam AL, Kohane IS. Adversarial attacks on medical machine learning. Science. 2019;363:1287–9.

Mandel JC, Kreda DA, Mandl KD, Kohane IS, Ramoni RB. SMART on FHIR: a standards-based, interoperable apps platform for electronic health records. J Am Med Inform Assoc. 2016;23:899–908.

Hersh WR, Weiner MG, Embi PJ, Logan JR, Payne PRO, Bernstam EV, et al. Caveats for the use of operational electronic health record data in comparative effectiveness research. Med Care. 2013;51(8 Suppl 3):S30–7.

Food and Drug Administration. Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD): FDA; 2019. https://www.regulations.gov/document?D=FDA-2019-N-1185-0001 . Accessed 1 May 2019.

Core MG, Lane HC, van Lent M, Gomboc D, Solomon S, Rosenberg M. Building Explainable Artificial Intelligence Systems. IAAI'06 Proceedings of the 18th conference on Innovative Applications of Artificial Intelligence. Volume 2; 2006. p. 1766–73.

Holzinger A, Biemann C, Pattichis CS. What do we need to build explainable AI systems for the medical domain? arXiv. 2017; https://arxiv.org/abs/1712.09923 . Accessed 1 May 2019.

Samek W, Wiegand T, Müller K-R. Explainable artificial intelligence: understanding, visualizing and interpreting deep learning models. arXiv. 2017; http://arxiv.org/abs/1708.08296 . Accessed 1 May 2019.

Bologna G, Hayashi Y. Characterization of symbolic rules embedded in deep DIMLP networks: a challenge to transparency of deep learning. J Art Intel Soft Comput Res. 2017;7(4):265–86. https://doi.org/10.1515/jaiscr-2017-0019 .

Fox J. A short account of Knowledge Engineering. Knowl Eng Rev. 1984;1:4–14. https://doi.org/10.1017/s0269888900000424 .

Lacave C, Díez FJ. A review of explanation methods for Bayesian networks. Knowl Eng Rev. 2002;17:107–27. https://doi.org/10.1017/s026988890200019x .

Doshi-Velez F, Kim B. Towards a rigorous science of interpretable machine learning. arXiv. 2017; http://arxiv.org/abs/1702.08608 . Accessed 1 May 2019.

Lehman CD, Wellman RD, Buist DSM, Kerlikowske K, Tosteson ANA, Miglioretti DL, et al. Diagnostic accuracy of digital screening mammography with and without computer-aided detection. JAMA Intern Med. 2015;175:1828–37.

Phansalkar S, van der Sijs H, Tucker AD, Desai AA, Bell DS, Teich JM, et al. Drug-drug interactions that should be non-interruptive in order to reduce alert fatigue in electronic health records. J Am Med Inform Assoc. 2013;20:489–93.

Sayres R, Taly A, Rahimy E, Blumer K, Coz D, Hammel N, et al. Using a deep learning algorithm and integrated gradients explanation to assist grading for diabetic retinopathy. Ophthalmology. 2019;126:552–64.

Wang D, Khosla A, Gargeya R, Irshad H, Beck AH. Deep Learning for Identifying Metastatic Breast Cancer. 2016. http://arxiv.org/abs/1606.05718 . Accessed 28 Aug 2019.

Google. People and AI Guidebook. https://pair.withgoogle.com/ . Accessed 10 May 2019.