ChatGPT performed at or near the passing threshold for all three exams of USMLE without any specialized training or reinforcement. Additionally, ChatGPT demonstrated a high level of concordance and insight in its explanations. These results suggest that large language models (LLMs) may have the potential to assist with medical education, and potentially, clinical decision-making. In this study, authors evaluate the performance of ChatGPT, a non-domain specific LLM, on its ability to perform clinical reasoning by testing its performance on questions from the United States Medical Licensing Examination (USMLE).
The USMLE is a high-stakes, comprehensive three-step standardized testing program covering all topics in physicians’ fund of knowledge, spanning basic science, clinical reasoning, medical management, and bioethics. The difficulty and complexity of questions is highly standardized and regulated, making it an ideal input substrate for AI testing.
ChatGPT is a general Large Language Model (LLM) developed by OpenAI. While the previous class of AI models have primarily been Deep Learning (DL) models, which are designed to learn and recognize patterns in data, LLMs are a new type of AI algorithm trained to predict the likelihood of a given sequence of words based on the context of the words that come before it.