Disparities in dermatology AI performance on a diverse, curated clinical image set

Science advances - Tập 8 Số 32 - 2022
Roxana Daneshjou1,2, Kailas Vodrahalli1,2, Weixin Liang1,2, Melissa J. Jenkins1,2, Veronica Rotemberg1,2, Justin Ko1,2, Susan M. Swetter1,2, Elizabeth E. Bailey1,2, Olivier Gevaert1,2, Pritam Mukherjee1,2, Michelle Phung1,2, Kiana Yekrang1,2, Bradley Fong1,2, Rachna Sahasrabudhe1,2, James Zou1,2, Albert S. Chiou1,2
1Memorial Sloan Kettering, NY, 10065
2Stanford University, CA, 94305

Tóm tắt

An estimated 3 billion people lack access to dermatological care globally. Artificial intelligence (AI) may aid in triaging skin diseases and identifying malignancies. However, most AI models have not been assessed on images of diverse skin tones or uncommon diseases. Thus, we created the Diverse Dermatology Images (DDI) dataset—the first publicly available, expertly curated, and pathologically confirmed image dataset with diverse skin tones. We show that state-of-the-art dermatology AI models exhibit substantial limitations on the DDI dataset, particularly on dark skin tones and uncommon diseases. We find that dermatologists, who often label AI datasets, also perform worse on images of dark skin tones and uncommon diseases. Fine-tuning AI models on the DDI images closes the performance gap between light and dark skin tones. These findings identify important weaknesses and biases in dermatology AI that should be addressed for reliable application to diverse patients and diseases.

Từ khóa


Tài liệu tham khảo

10.1089/tmj.2018.0130

10.1016/j.jaad.2006.04.001

10.1038/s41591-020-0942-0

10.1038/nature21056

10.1001/jamadermatol.2021.3129

10.1136/bmj.m127

10.1038/sdata.2018.161

N. Codella V. Rotemberg P. Tschandl M. Emre Celebi S. Dusza D. Gutman B. Helba A. Kalloo K. Liopyris M. Marchetti H. Kittler A. Halpern Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (ISIC)

https://arxiv.org/abs/1902.03368 (2018).

N. M. Kinyanjui T. Odonga C. Cintas N. C. F. Codella R. Panda P. Sattigeri K. R. Varshney Fairness of classifiers across skin tones in dermatology. Medical Image Computing and Computer Assisted Intervention – MICCAI 2020. October 4–8 2020 Lima Peru (Springer 2020) pp. 320–329.

M. Groh C. Harris L. Soenksen F. Lau R. Han A. Kim A. Koochek O. Badri Evaluating deep neural networks trained on clinical images in dermatology with the Fitzpatrick 17k dataset; https://arxiv.org/abs/2104.09957 (2021).

10.1001/jamadermatol.2021.4915

10.1016/j.jid.2020.01.019

S. Sagawa P. W. Koh T. B. Hashimoto P. Liang Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization; https://arxiv.org/abs/1911.08731 (2019).

B. Sun K. Saenko Deep coral: Correlation alignment for deep domain adaptation paper presented at the European Conference on Computer Vision October 8-16 2016 Amsterdam Netherlands (Springer 2016) pp. 443–450.

Y. Li X. Tian M. Gong Y. Liu T. Liu K. Zhang D. Tao Deep domain generalization via conditional invariant adversarial networks in European Conference on Computer Vision September 8–14 Munich Germany (Springer 2018) pp. 647–663.

S. S. Han, Y. J. Kim, I. J. Moon, J. M. Jung, M. Y. Lee, W. J. Lee, C. H. Won, M. W. Lee, S. H. Kim, C. Navarrete-Dechent, S. E. Chang, Evaluation of artificial intelligence-assisted diagnosis of skin neoplasms: A single-center, paralleled, unmasked, randomized controlled trial. J Invest Dermatol. S0022-202X, 00122–00121 (2022).

10.1016/j.jaad.2021.07.005

10.1016/j.jaad.2013.11.038

10.1038/s41591-020-0842-3

A. Hekler, J. N. Kather, E. Krieghoff-Henning, J. S. Utikal, F. Meier, F. F. Gellrich, J. Upmeier zu Belzen, L. French, J. G. Schlager, K. Ghoreschi, T. Wilhelm, H. Kutzner, C. Berking, M. V. Heppt, S. Haferkamp, W. Sondermann, D. Schadendorf, B. Schilling, B. Izar, R. Maron, M. Schmitt, S. Fröhling, D. B. Lipka, T. J. Brinker, Effects of label noise on deep learning-based skin cancer classification. Front Med. (Lausanne) 7, 177 (2020).

10.1111/bjd.19811

10.1016/j.jaad.2020.04.084

10.1016/j.jaad.2021.03.088

10.1016/j.clindermatol.2021.08.019

10.1093/annonc/mdy166

10.1111/bjd.19932

K. Vodrahalli R. Daneshjou R. A. Novoa A. Chiou J. M. Ko J. Zou TrueImage: A machine learning algorithm to improve the quality of telehealth photos Pacific Symposium on Biocomputing 2020 Big Island Hawaii January 3–7 2021. Pacific Symposium on Biocomputing 26 220-231 (2021); http://psb.stanford.edu/psb-online/proceedings/psb21/vodrahalli.pdf.

D. P. Kingma J. Ba Adam: A method for stochastic optimization; https://arxiv.org/abs/1412.6980 (2014).

10.1111/j.1600-0560.1984.tb00396.x

10.1111/ijd.15893

10.1016/j.jdcr.2017.11.018

R. Yavuzer, Y. Başterzi, A. Sari, F. Bir, C. Sezer, Chondroid syringoma: A diagnosis more frequent than expected. Dermatol. Surg. 29, 179–181 (2003).

10.1001/archdermatol.2009.328

10.1097/GOX.0000000000001006

S. Sulochana, M. Manoharan, Anitha, Chondroid syringoma-An unusual presentation. J. Clin. Diagn. Res. 8, FD13–FD14 (2014).

10.1111/ced.12832

M. J. Hernández-San Martín, P. Vargas-Mora, L. Aranibar, Juvenile xanthogranuloma: An entity with a wide clinical spectrum. Actas Dermosifiliogr (Engl Ed). 111, 725–733 (2020).

10.1111/ijd.14046

10.1007/s40257-015-0112-1

A. Sapra, R. Dix, P. Bhandari, A. Mohammed, E. Ranjit, A case of extensive debilitating generalized morphea. Cureus 12, e8117 (2020).

10.1590/abd1806-4841.20175217

10.1016/S1368-8375(02)00088-X

10.1097/01.pas.0000170344.17186.6c

10.1016/j.hoc.2020.01.012

10.3390/cancers13225692

10.1590/abd1806-4841.20187106

10.1016/j.jaad.2016.07.046

10.1001/archderm.143.7.854

H. Zhang M. Cisse Y. N. Dauphin D. Lopez-Paz mixup: Beyond empirical risk minimization; https://arxiv.org/abs/1710.09412 (2017).

L. Zhang Z. Deng K. Kawaguchi A. Ghorbani J. Zou How does mixup help with robustness and generalization? https://arxiv.org/abs/2010.04819 (2021).