Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models

Public Library of Science (PLoS) - Tập 2 Số 2 - Trang e0000198

Tiffany H. Kung^1,2, Morgan Cheatham^3,4, Arielle Medenilla¹, Czarina Sillos¹, Lorie De Leon¹, Camille Elepaño¹, Maria Madriaga¹, Rimel Aggabao¹, Giezel Diaz-Candido¹, James Maningo¹, Victor Tseng^1,5

¹AnsibleHealth, Inc Mountain View, California, United States of America

²Department of Anesthesiology, Massachusetts General Hospital, Harvard School of Medicine Boston, Massachusetts, United States of America

³Brown University Providence, Rhode Island, United States of America

⁴Warren Alpert Medical School

⁵Department of Medical Education, UWorld, LLC Dallas, Texas, United States of America

Tóm tắt

We evaluated the performance of a large language model called ChatGPT on the United States Medical Licensing Exam (USMLE), which consists of three exams: Step 1, Step 2CK, and Step 3. ChatGPT performed at or near the passing threshold for all three exams without any specialized training or reinforcement. Additionally, ChatGPT demonstrated a high level of concordance and insight in its explanations. These results suggest that large language models may have the potential to assist with medical education, and potentially, clinical decision-making.

Từ khóa

Tài liệu tham khảo

C Szegedy, 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

W Zhang, 2019, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Y Bhatia, 2019, 2019 Twelfth International Conference on Contemporary Computing (IC3)

MBA McDermott, 2021, Reproducibility in machine learning for health research: Still a ways to go., Sci Transl Med., 13

P-HC Chen, 2019, How to develop machine learning models for healthcare., Nat Mater., 18, 410, 10.1038/s41563-019-0345-0

V Gulshan, 2016, Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs, JAMA, 316, 2402, 10.1001/jama.2016.17216

K Nagpal, 2019, Development and validation of a deep learning algorithm for improving Gleason scoring of prostate cancer, NPJ Digit Med, 2, 48, 10.1038/s41746-019-0112-2

Y Liu, 2020, A deep learning system for differential diagnosis of skin diseases, Nat Med, 26, 900, 10.1038/s41591-020-0842-3

[cited 26 Jan 2023]. Available: https://openai.com/blog/chatgpt/

Performance data. [cited 26 Jan 2023]. Available: https://www.usmle.org/performance-data

J Burk-Rafel, 2017, Study Behaviors and USMLE Step 1 Performance: Implications of a Student Self-Directed Parallel Curriculum., Acad Med., 92, S67, 10.1097/ACM.0000000000001916

V Liévin, 2022, Can large language models reason about medical questions?, arXiv [cs.CL]

D Jin, 2020, What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams., arXiv [cs.CL]

Stanford CRFM. [cited 18 Jan 2023]. Available: https://crfm.stanford.edu/2022/12/15/pubmedgpt.html

P. Densen, 2011, Challenges and opportunities facing medical education, Trans Am Clin Climatol Assoc, 122, 48

V Prasad, 2013, A decade of reversal: an analysis of 146 contradicted medical practices, Mayo Clin Proc, 88, 790, 10.1016/j.mayocp.2013.05.012

D Herrera-Perez, 2019, A comprehensive review of randomized clinical trials in three medical journals reveals 396 medical reversals., Elife., 8, 10.7554/eLife.45183

JJ Abou-Hanna, 2021, Resuscitating the Socratic Method: Student and Faculty Perspectives on Posing Probing Questions During Clinical Teaching., Acad Med., 96, 113, 10.1097/ACM.0000000000003580

D Plana, 2022, Randomized Clinical Trials of Machine Learning Interventions in Health Care, A Systematic Review. JAMA Netw Open, 5, e2233946, 10.1001/jamanetworkopen.2022.33946

HJ Kan, 2019, Exploring the use of machine learning for risk adjustment: A comparison of standard and penalized linear regression models in predicting health care costs in older adults., PLoS One., 14, e0213258, 10.1371/journal.pone.0213258

RJ Delahanty, 2018, Development and Evaluation of an Automated Machine Learning Algorithm for In-Hospital Mortality Risk Adjustment Among Critical Care Patients., Crit Care Med, 46, e481, 10.1097/CCM.0000000000003011

B Vasey, 2022, Reporting guideline for the early-stage clinical evaluation of decision support systems driven by artificial intelligence: DECIDE-AI, Nat Med, 28, 924, 10.1038/s41591-022-01772-9

C Garcia-Vidal, 2019, Artificial intelligence to support clinical decision-making processes, EBioMedicine, 46, 27, 10.1016/j.ebiom.2019.07.019

S Bala, 2020, Patient Perception of Plain-Language Medical Notes Generated Using Artificial Intelligence Software, Pilot Mixed-Methods Study. JMIR Form Res, 4, e16670

M Milne-Ives, 2020, The Effectiveness of Artificial Intelligence Conversational Agents in Health Care, Systematic Review. J Med Internet Res, 22, e20346, 10.2196/20346

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA