arXiv Open Access 2024

Read Like a Radiologist: Efficient Vision-Language Model for 3D Medical Imaging Interpretation

Changsun Lee Sangjoon Park Cheong-Il Shin Woo Hee Choi Hyun Jeong Park +2 lainnya

Lihat Sumber

Abstrak

Recent medical vision-language models (VLMs) have shown promise in 2D medical image interpretation. However extending them to 3D medical imaging has been challenging due to computational complexities and data scarcity. Although a few recent VLMs specified for 3D medical imaging have emerged, all are limited to learning volumetric representation of a 3D medical image as a set of sub-volumetric features. Such process introduces overly correlated representations along the z-axis that neglect slice-specific clinical details, particularly for 3D medical images where adjacent slices have low redundancy. To address this limitation, we introduce MS-VLM that mimic radiologists' workflow in 3D medical image interpretation. Specifically, radiologists analyze 3D medical images by examining individual slices sequentially and synthesizing information across slices and views. Likewise, MS-VLM leverages self-supervised 2D transformer encoders to learn a volumetric representation that capture inter-slice dependencies from a sequence of slice-specific features. Unbound by sub-volumetric patchification, MS-VLM is capable of obtaining useful volumetric representations from 3D medical images with any slice length and from multiple images acquired from different planes and phases. We evaluate MS-VLM on publicly available chest CT dataset CT-RATE and in-house rectal MRI dataset. In both scenarios, MS-VLM surpasses existing methods in radiology report generation, producing more coherent and clinically relevant reports. These findings highlight the potential of MS-VLM to advance 3D medical image interpretation and improve the robustness of medical VLMs.

Topik & Kata Kunci

eess.IV cs.CL cs.CV cs.LG

Penulis (7)

Changsun Lee

Sangjoon Park

Cheong-Il Shin

Woo Hee Choi

Hyun Jeong Park

Jeong Eun Lee

Jong Chul Ye

Format Sitasi

APA MLA BibTeX

Lee, C., Park, S., Shin, C., Choi, W.H., Park, H.J., Lee, J.E. et al. (2024). Read Like a Radiologist: Efficient Vision-Language Model for 3D Medical Imaging Interpretation. https://arxiv.org/abs/2412.13558

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓