arXiv Open Access 2025

Performance of Large Language Models in Answering Critical Care Medicine Questions

Mahmoud Alwakeel Aditya Nagori An-Kwok Ian Wong Neal Chaisson Vijay Krishnamoorthy +1 lainnya
Lihat Sumber

Abstrak

Large Language Models have been tested on medical student-level questions, but their performance in specialized fields like Critical Care Medicine (CCM) is less explored. This study evaluated Meta-Llama 3.1 models (8B and 70B parameters) on 871 CCM questions. Llama3.1:70B outperformed 8B by 30%, with 60% average accuracy. Performance varied across domains, highest in Research (68.4%) and lowest in Renal (47.9%), highlighting the need for broader future work to improve models across various subspecialty domains.

Topik & Kata Kunci

Penulis (6)

M

Mahmoud Alwakeel

A

Aditya Nagori

A

An-Kwok Ian Wong

N

Neal Chaisson

V

Vijay Krishnamoorthy

R

Rishikesan Kamaleswaran

Format Sitasi

Alwakeel, M., Nagori, A., Wong, A.I., Chaisson, N., Krishnamoorthy, V., Kamaleswaran, R. (2025). Performance of Large Language Models in Answering Critical Care Medicine Questions. https://arxiv.org/abs/2509.19344

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓