arXiv Open Access 2025

HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation

Tianwei Lin Wenqiao Zhang Sijing Li Yuqian Yuan Binhe Yu +10 lainnya
Lihat Sumber

Abstrak

We present HealthGPT, a powerful Medical Large Vision-Language Model (Med-LVLM) that integrates medical visual comprehension and generation capabilities within a unified autoregressive paradigm. Our bootstrapping philosophy is to progressively adapt heterogeneous comprehension and generation knowledge to pre-trained large language models (LLMs). This is achieved through a novel heterogeneous low-rank adaptation (H-LoRA) technique, which is complemented by a tailored hierarchical visual perception approach and a three-stage learning strategy. To effectively learn the HealthGPT, we devise a comprehensive medical domain-specific comprehension and generation dataset called VL-Health. Experimental results demonstrate exceptional performance and scalability of HealthGPT in medical visual unified tasks. Our project can be accessed at https://github.com/DCDmllm/HealthGPT.

Topik & Kata Kunci

Penulis (15)

T

Tianwei Lin

W

Wenqiao Zhang

S

Sijing Li

Y

Yuqian Yuan

B

Binhe Yu

H

Haoyuan Li

W

Wanggui He

H

Hao Jiang

M

Mengze Li

X

Xiaohui Song

S

Siliang Tang

J

Jun Xiao

H

Hui Lin

Y

Yueting Zhuang

B

Beng Chin Ooi

Format Sitasi

Lin, T., Zhang, W., Li, S., Yuan, Y., Yu, B., Li, H. et al. (2025). HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation. https://arxiv.org/abs/2502.09838

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓