Semantic Scholar Open Access 2025 4 sitasi

Textual Proficiency and Visual Deficiency: A Comparative Study of Large Language Models and Radiologists in MRI Artifact Detection and Correction.

Y. Gunes T. Cesur E. Çamur B. E. Çifçi Turan Kaya +3 lainnya

Abstrak

RATIONALE AND OBJECTIVES To assess the performance of Large Language Models (LLMs) in detecting and correcting MRI artifacts compared to radiologists using text-based and visual questions. MATERIALS AND METHODS This cross-sectional observational study included three phases. Phase 1 involved six LLMs (ChatGPT o1-preview, ChatGPT-4o, ChatGPT-4V, Google Gemini 1.5 Pro, Claude 3.5 Sonnet, Claude 3 Opus) and five radiologists (two residents, two junior radiologists, one senior radiologist) answering 42 text-based questions on MRI artifacts. In Phase 2, the same radiologists and five multimodal LLMs evaluated 100 MRI images, each containing a single artifact. Phase 3 reassessed the identical tasks 1.5 months later to evaluate temporal consistency. Responses were graded using 4-point Likert scales for "Management Score" (text-based) and "Correction Score" (visual). McNemar's test compared response accuracy, and the Wilcoxon test assessed score differences. RESULTS LLMs outperformed radiologists in text-based tasks, with ChatGPT o1-preview scoring the highest (3.71±0.60 in Round 1; 3.76±0.84 in Round 2) (p<0.05). In visual tasks, radiologists performed significantly better, with the Senior Radiologist achieving 92% and 94% accuracy in Rounds 1 and 2, respectively (p<0.05). The top-performing LLM (ChatGPT-4o) achieved only 20% and 18% accuracy. Correction Scores mirrored this difference, with radiologists consistently scoring higher than LLMs (p<0.05). CONCLUSION LLMs excel in text-based tasks but have notable limitations in visual artifact interpretation, making them unsuitable for independent diagnostics. They are promising as educational tools or adjuncts in "human-in-the-loop" systems, with multimodal AI improvements necessary to bridge these gaps.

Topik & Kata Kunci

Penulis (8)

Y

Y. Gunes

T

T. Cesur

E

E. Çamur

B

B. E. Çifçi

T

Turan Kaya

M

Mehmet Numan Colakoglu

U

Ural Koç

R

R. S. Okten

Format Sitasi

Gunes, Y., Cesur, T., Çamur, E., Çifçi, B.E., Kaya, T., Colakoglu, M.N. et al. (2025). Textual Proficiency and Visual Deficiency: A Comparative Study of Large Language Models and Radiologists in MRI Artifact Detection and Correction.. https://doi.org/10.1016/j.acra.2025.01.004

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.1016/j.acra.2025.01.004
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Total Sitasi
Sumber Database
Semantic Scholar
DOI
10.1016/j.acra.2025.01.004
Akses
Open Access ✓