arXiv Open Access 2026

Beyond Medical Diagnostics: How Medical Multimodal Large Language Models Think in Space

Quoc-Huy Trinh Xi Ding Yang Liu Zhenyue Qin Xingjian Li +5 lainnya
Lihat Sumber

Abstrak

Visual spatial intelligence is critical for medical image interpretation, yet remains largely unexplored in Multimodal Large Language Models (MLLMs) for 3D imaging. This gap persists due to a systemic lack of datasets featuring structured 3D spatial annotations beyond basic labels. In this study, we introduce an agentic pipeline that autonomously synthesizes spatial visual question-answering (VQA) data by orchestrating computational tools such as volume and distance calculators with multi-agent collaboration and expert radiologist validation. We present SpatialMed, the first comprehensive benchmark for evaluating 3D spatial intelligence in medical MLLMs, comprising nearly 10K question-answer pairs across multiple organs and tumor types. Our evaluations on 14 state-of-the-art MLLMs and extensive analyses reveal that current models lack robust spatial reasoning capabilities for medical imaging.

Topik & Kata Kunci

Penulis (10)

Q

Quoc-Huy Trinh

X

Xi Ding

Y

Yang Liu

Z

Zhenyue Qin

X

Xingjian Li

G

Gorkem Durak

H

Halil Ertugrul Aktas

E

Elif Keles

U

Ulas Bagci

M

Min Xu

Format Sitasi

Trinh, Q., Ding, X., Liu, Y., Qin, Z., Li, X., Durak, G. et al. (2026). Beyond Medical Diagnostics: How Medical Multimodal Large Language Models Think in Space. https://arxiv.org/abs/2603.13800

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2026
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓