Artificial Intelligence Versus Medical Students in General Surgery Exam

Artificial Intelligence (AI) has become increasingly involved in daily activities, social networks, as well as healthcare in recent years. Our aim in this study was to examine the performance of AI programs in the general surgery exam and compare the results with each other and with medical students. Thirty questions in Turkish were asked to 30 volunteered, fourth-grade medical students and simultaneously to four AI programs (ChatGPT-3.5 (Chat Generative Pre-Trained Transformer), ChatGPT-4, BARD (Bidirectional Encoder Representations from Transformers Auto-Regressive Decoder), and Bing). The questions were multiple-choice, five options, both theoretical (n = 15) and clinical (n = 15), and had not been asked anywhere before. The rate of correct answers for medical students and AI programs and the time to answer questions for AI programs were examined and compared. The most correct answers to all questions and theoretical questions were given by ChatGPT-4 (66.7% and 53.3%,respectively), while ChatGPT-3.5 was the most successful in clinical questions (86.7%). The comparisons revealed no significant differences between the medical students and AI programs and between themselves in AI programs. In the results of time to answer questions for AI programs, ChatGPT-3.5 was the fastest for both all questions and theoretical and clinical questions. In clinical questions, the fact that AI programs give more correct answers than medical students may suggest that clinical medical information algorithms are better than the theoretical medical algorithm. We would also like to emphasize that ChatGPT-3.5 is the fastest AI program in answering all questions, theoretical and clinical questions.

Comments (0)

No login
gif