arXiv Open Access 2025

HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation

Tianwei Lin Wenqiao Zhang Sijing Li Yuqian Yuan Binhe Yu +10 lainnya

Lihat Sumber

Abstrak

We present HealthGPT, a powerful Medical Large Vision-Language Model (Med-LVLM) that integrates medical visual comprehension and generation capabilities within a unified autoregressive paradigm. Our bootstrapping philosophy is to progressively adapt heterogeneous comprehension and generation knowledge to pre-trained large language models (LLMs). This is achieved through a novel heterogeneous low-rank adaptation (H-LoRA) technique, which is complemented by a tailored hierarchical visual perception approach and a three-stage learning strategy. To effectively learn the HealthGPT, we devise a comprehensive medical domain-specific comprehension and generation dataset called VL-Health. Experimental results demonstrate exceptional performance and scalability of HealthGPT in medical visual unified tasks. Our project can be accessed at https://github.com/DCDmllm/HealthGPT.

Topik & Kata Kunci

cs.CV cs.AI

Penulis (15)

Tianwei Lin

Wenqiao Zhang

Sijing Li

Yuqian Yuan

Binhe Yu

Haoyuan Li

Wanggui He

Hao Jiang

Mengze Li

Xiaohui Song

Siliang Tang

Jun Xiao

Hui Lin

Yueting Zhuang

Beng Chin Ooi

Format Sitasi

APA MLA BibTeX

Lin, T., Zhang, W., Li, S., Yuan, Y., Yu, B., Li, H. et al. (2025). HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation. https://arxiv.org/abs/2502.09838

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓