An exploratory assessment of GPT-4o and GPT-4 performance on the Japanese National Dental Examination

The Saudi Dental JournalVolume 36, Issue 12, December 2024, Pages 1577-1581The Saudi Dental JournalAuthor links open overlay panel, , , , , , , , AbstractBackground and Objectives

Multiple large language models (LLMs) have been released since 2022, including OpenAI’s GPT-3.5 and GPT-4. The latest model, GPT-4o, introduced on May 13, 2024, significantly improves GPT-4. Previous studies have shown the potential of LLMs as educational tools in medical and dental exams. This study evaluates the accuracy of GPT-4 and GPT-4o responses for the Japanese National Dental Examination (JNDE) to assess their potential as educational tools for dental education.

Materials and methods

We obtained the dataset of the 117th JNDE, administered in January 2024, consisting of 360 questions. After excluding questions with images and inappropriate ones, 202 questions were selected. GPT-4 and GPT-4o were used to generate responses. Standardized prompts ensured consistent input. Data analysis used Qlik Sense® and GraphPad Prism, employing Fisher’s exact test.

Results

GPT-4o showed a significantly higher correct response rate (73.8%) than GPT-4 (63.3%). In the compulsory section, GPT-4o achieved 88.6% accuracy, significantly higher than GPT-4′s 74.3%. Though not statistically significant, the general section saw an improvement with GPT-4o (66.4%) over GPT-4 (58.0%).

Conclusion

GPT-4o significantly outperformed GPT-4 in accuracy for JNDE questions, suggesting its improved potential as an educational tool in dental education. Further studies are needed to evaluate GPT-4o’s capabilities with visual materials and in diverse question sets to fully ascertain its utility in educational settings.

Keywords

GPT-4o

GPT-4

Japanese National Dental Examination

Education tool

© 2024 THE AUTHORS. Published by Elsevier B.V. on behalf of King Saud University.

Comments (0)

No login
gif