Comparison of ChatGPT plus (version 4.0) and pretrained AI model (Orthopod) on orthopaedic in-training exam (OITE)

The emergence of readily accessible large language model (LLM) artificial intelligence (AI) software in recent years has generated significant interest within the academic community. Computer chatbots like ELIZA have been around for decades [1,2], and the continued advancement in this technology has culminated in the development of the strikingly more advanced AI-powered chatbots seen today. From the time of its initial launch in November 2022, the OpenAI chatbot Chat Generative Pretrained Transformer (ChatGPT) garnered excitement over its wise range of uses [3]. ChatGPT now sees nearly 100-million weekly active users who enjoy the easy-to-use interface to answer questions, proofread emails, plan vacation itineraries, and much more [3,4].

The ability of ChatGPT to analyze language and provide detailed explanations to complex questions has sparked interest into its use as an educational tool. Recent studies have demonstrated promising results with respect to ChatGPT's performance on national standardized tests within the medical domain [[5], [6], [7]]. In early 2023, Kung et al. reported on ChatGPT's ability to obtain passing scores of 75.0 %, 61.5 %, and 68.8 % on the United States Medical Licensing Examination (USMLE) Step 1, Step 2CK, and Step 3 exams, respectively [5]. Additional studies, such as those by Lewandowski et al. and Chen et al. have further demonstrated ChatGPT's capacity to surpass established passing scores (60 %) on specialty specific board-style examination questions in dermatology and neurology, respectively [6,7].

Given its standardized nature and regular use by orthopaedic residents, the OITE is a perfect standard by which to test LLM AIs like ChatGPT [8]. Two research groups have evaluated ChatGPT's ability to answer questions derived from the Orthopaedic In-Training Examination (OITE) [9,10]. The authors of both studies demonstrated an impressive ability by ChatGPTv4 to answer OITE questions but were both limited to 61.2 % [9] and 47.2 % [10] correctly answered questions.

Despite LLM AIs astonishing performance on these standardized exams, there is still considerable room for improvement. The primary objective of this study was to assess our ability to improve ChatGPT's performance on the OITE by utilizing a custom-trained ChatGPT model using OITE preparatory books (Orthopod).

Comments (0)

No login
gif