Sailing the Seven Seas: A Multinational Comparison of ChatGPT’s Performance on Medical Licensing Examinations

Springer Science and Business Media LLC - Trang 1-4 - 2023

Michael Alfertshofer¹, Cosima C. Hoch², Paul F. Funk³, Katharina Hollmann⁴, Barbara Wollenberg², Samuel Knoedler⁵, Leonard Knoedler⁵

¹Division of Hand, Plastic and Aesthetic Surgery, Ludwig-Maximilians University Munich, Munich, Germany

²Department of Otolaryngology, Head and Neck Surgery, School of Medicine, Technical University of Munich (TUM), Munich, Germany

³Department of Otolaryngology, Head and Neck Surgery, University Hospital Jena, Friedrich Schiller University Jena, Jena, Germany

⁴Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, USA

⁵Department of Plastic, Hand and Reconstructive Surgery, University Hospital Regensburg, Regensburg, Germany

Tóm tắt

The use of AI-powered technology, particularly OpenAI’s ChatGPT, holds significant potential to reshape healthcare and medical education. Despite existing studies on the performance of ChatGPT in medical licensing examinations across different nations, a comprehensive, multinational analysis using rigorous methodology is currently lacking. Our study sought to address this gap by evaluating the performance of ChatGPT on six different national medical licensing exams and investigating the relationship between test question length and ChatGPT’s accuracy. We manually inputted a total of 1,800 test questions (300 each from US, Italian, French, Spanish, UK, and Indian medical licensing examination) into ChatGPT, and recorded the accuracy of its responses. We found significant variance in ChatGPT’s test accuracy across different countries, with the highest accuracy seen in the Italian examination (73% correct answers) and the lowest in the French examination (22% correct answers). Interestingly, question length correlated with ChatGPT’s performance in the Italian and French state examinations only. In addition, the study revealed that questions requiring multiple correct answers, as seen in the French examination, posed a greater challenge to ChatGPT. Our findings underscore the need for future research to further delineate ChatGPT’s strengths and limitations in medical test-taking across additional countries and to develop guidelines to prevent AI-assisted cheating in medical examinations.

Tài liệu tham khảo

Chartier, C., et al. Artificial intelligence-enabled evaluation of pain sketches to predict outcomes in headache surgery. Plast. Reconstr. Surg. 151(2):405–411, 2023. Knoedler, L., et al. Artificial intelligence-enabled simulation of gluteal augmentation: a helpful tool in preoperative outcome simulation? J. Plast. Reconstr. Aesthet. Surg. 80:94–101, 2023. Knoedler, L., et al. A Ready-to-use grading tool for facial palsy examiners-automated grading system in facial palsy patients made easy. J. Pers. Med. 12(10):1739, 2022. Hoch, C. C., et al. ChatGPT’s quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions. Eur. Arch. Otorhinolaryngol. 280:4271–4278, 2023. Kasai, J., et al. Evaluating gpt-4 and ChatGPT on Japanese medical licensing examinations. arXiv preprint arXiv:2303.18027, 2023. Wu, J., et al. Qualifying Chinese medical licensing examination with knowledge enhanced generative pre-training model. arXiv preprint arXiv:2305.10163, 2023. Jung, L., et al. ChatGPT passes German state examination in medicine with picture questions omitted. Deutsches Ärzteblatt. 2:89, 2023. https://doi.org/10.3238/arztebl.m2023.0113.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA