Evaluating the Performance of ChatGPT in Ophthalmology
Abstrak
We tested the accuracy of ChatGPT, a large language model (LLM), in the ophthalmology question-answering space using two popular multiple choice question banks used for the high-stakes Ophthalmic Knowledge Assessment Program (OKAP) exam. The testing sets were of easy-to-moderate difficulty and were diversified, including recall, interpretation, practical and clinical decision-making problems. ChatGPT achieved 55.8% and 42.7% accuracy in the two 260-question simulated exams. Its performance varied across subspecialties, with the best results in general medicine and the worst in neuro-ophthalmology and ophthalmic pathology and intraocular tumors. These results are encouraging but suggest that specialising LLMs through domain-specific pre-training may be necessary to improve their performance in ophthalmic subspecialties.
Topik & Kata Kunci
Penulis (5)
F. Antaki
Samir Touma
D. Milad
J. El-Khoury
R. Duval
Akses Cepat
- Tahun Terbit
- 2023
- Bahasa
- en
- Total Sitasi
- 444×
- Sumber Database
- Semantic Scholar
- DOI
- 10.1101/2023.01.22.23284882
- Akses
- Open Access ✓