Semantic Scholar Open Access 2025 4 sitasi

Textual Proficiency and Visual Deficiency: A Comparative Study of Large Language Models and Radiologists in MRI Artifact Detection and Correction.

Y. Gunes T. Cesur E. Çamur B. E. Çifçi Turan Kaya +3 lainnya

Abstrak

RATIONALE AND OBJECTIVES To assess the performance of Large Language Models (LLMs) in detecting and correcting MRI artifacts compared to radiologists using text-based and visual questions. MATERIALS AND METHODS This cross-sectional observational study included three phases. Phase 1 involved six LLMs (ChatGPT o1-preview, ChatGPT-4o, ChatGPT-4V, Google Gemini 1.5 Pro, Claude 3.5 Sonnet, Claude 3 Opus) and five radiologists (two residents, two junior radiologists, one senior radiologist) answering 42 text-based questions on MRI artifacts. In Phase 2, the same radiologists and five multimodal LLMs evaluated 100 MRI images, each containing a single artifact. Phase 3 reassessed the identical tasks 1.5 months later to evaluate temporal consistency. Responses were graded using 4-point Likert scales for "Management Score" (text-based) and "Correction Score" (visual). McNemar's test compared response accuracy, and the Wilcoxon test assessed score differences. RESULTS LLMs outperformed radiologists in text-based tasks, with ChatGPT o1-preview scoring the highest (3.71±0.60 in Round 1; 3.76±0.84 in Round 2) (p<0.05). In visual tasks, radiologists performed significantly better, with the Senior Radiologist achieving 92% and 94% accuracy in Rounds 1 and 2, respectively (p<0.05). The top-performing LLM (ChatGPT-4o) achieved only 20% and 18% accuracy. Correction Scores mirrored this difference, with radiologists consistently scoring higher than LLMs (p<0.05). CONCLUSION LLMs excel in text-based tasks but have notable limitations in visual artifact interpretation, making them unsuitable for independent diagnostics. They are promising as educational tools or adjuncts in "human-in-the-loop" systems, with multimodal AI improvements necessary to bridge these gaps.

Topik & Kata Kunci

Medicine

Penulis (8)

Y. Gunes

T. Cesur

E. Çamur

B. E. Çifçi

Turan Kaya

Mehmet Numan Colakoglu

Ural Koç

R. S. Okten

Format Sitasi

APA MLA BibTeX

Gunes, Y., Cesur, T., Çamur, E., Çifçi, B.E., Kaya, T., Colakoglu, M.N. et al. (2025). Textual Proficiency and Visual Deficiency: A Comparative Study of Large Language Models and Radiologists in MRI Artifact Detection and Correction.. https://doi.org/10.1016/j.acra.2025.01.004

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber doi.org/10.1016/j.acra.2025.01.004

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Total Sitasi: 4×
Sumber Database: Semantic Scholar
DOI: 10.1016/j.acra.2025.01.004
Akses: Open Access ✓