arXiv Open Access 2025

Describe Anything in Medical Images

Xi Xiao Yunbei Zhang Thanh-Huy Nguyen Ba-Thinh Lam Janet Wang +8 lainnya
Lihat Sumber

Abstrak

Localized image captioning has made significant progress with models like the Describe Anything Model (DAM), which can generate detailed region-specific descriptions without explicit region-text supervision. However, such capabilities have yet to be widely applied to specialized domains like medical imaging, where diagnostic interpretation relies on subtle regional findings rather than global understanding. To mitigate this gap, we propose MedDAM, the first comprehensive framework leveraging large vision-language models for region-specific captioning in medical images. MedDAM employs medical expert-designed prompts tailored to specific imaging modalities and establishes a robust evaluation benchmark comprising a customized assessment protocol, data pre-processing pipeline, and specialized QA template library. This benchmark evaluates both MedDAM and other adaptable large vision-language models, focusing on clinical factuality through attribute-level verification tasks, thereby circumventing the absence of ground-truth region-caption pairs in medical datasets. Extensive experiments on the VinDr-CXR, LIDC-IDRI, and SkinCon datasets demonstrate MedDAM's superiority over leading peers (including GPT-4o, Claude 3.7 Sonnet, LLaMA-3.2 Vision, Qwen2.5-VL, GPT-4Rol, and OMG-LLaVA) in the task, revealing the importance of region-level semantic alignment in medical image understanding and establishing MedDAM as a promising foundation for clinical vision-language integration.

Topik & Kata Kunci

Penulis (13)

X

Xi Xiao

Y

Yunbei Zhang

T

Thanh-Huy Nguyen

B

Ba-Thinh Lam

J

Janet Wang

L

Lin Zhao

J

Jihun Hamm

T

Tianyang Wang

X

Xingjian Li

X

Xiao Wang

H

Hao Xu

T

Tianming Liu

M

Min Xu

Format Sitasi

Xiao, X., Zhang, Y., Nguyen, T., Lam, B., Wang, J., Zhao, L. et al. (2025). Describe Anything in Medical Images. https://arxiv.org/abs/2505.05804

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓