EVLF-FM: Explainable Vision Language Foundation Model for Medicine
Abstrak
Despite the promise of foundation models in medical AI, current systems remain limited - they are modality-specific and lack transparent reasoning processes, hindering clinical adoption. To address this gap, we present EVLF-FM, a multimodal vision-language foundation model (VLM) designed to unify broad diagnostic capability with fine-grain explainability. The development and testing of EVLF-FM encompassed over 1.3 million total samples from 23 global datasets across eleven imaging modalities related to six clinical specialties: dermatology, hepatology, ophthalmology, pathology, pulmonology, and radiology. External validation employed 8,884 independent test samples from 10 additional datasets across five imaging modalities. Technically, EVLF-FM is developed to assist with multiple disease diagnosis and visual question answering with pixel-level visual grounding and reasoning capabilities. In internal validation for disease diagnostics, EVLF-FM achieved the highest average accuracy (0.858) and F1-score (0.797), outperforming leading generalist and specialist models. In medical visual grounding, EVLF-FM also achieved stellar performance across nine modalities with average mIOU of 0.743 and Acc@0.5 of 0.837. External validations further confirmed strong zero-shot and few-shot performance, with competitive F1-scores despite a smaller model size. Through a hybrid training strategy combining supervised and visual reinforcement fine-tuning, EVLF-FM not only achieves state-of-the-art accuracy but also exhibits step-by-step reasoning, aligning outputs with visual evidence. EVLF-FM is an early multi-disease VLM model with explainability and reasoning capabilities that could advance adoption of and trust in foundation models for real-world clinical deployment.
Topik & Kata Kunci
Penulis (43)
Yang Bai
Haoran Cheng
Yang Zhou
Jun Zhou
Arun Thirunavukarasu
Yuhe Ke
Jie Yao
Kanae Fukutsu
Chrystie Wan Ning Quek
Ashley Hong
Laura Gutierrez
Zhen Ling Teo
Darren Shu Jeng Ting
Brian T. Soetikno
Christopher S. Nielsen
Tobias Elze
Zengxiang Li
Linh Le Dinh
Hiok Hong Chan
Victor Koh
Marcus Tan
Kelvin Z. Li
Leonard Yip
Ching Yu Cheng
Yih Chung Tham
Gavin Siew Wei Tan
Leopold Schmetterer
Marcus Ang
Rahat Hussain
Jod Mehta
Tin Aung
Lionel Tim-Ee Cheng
Tran Nguyen Tuan Anh
Chee Leong Cheng
Tien Yin Wong
Nan Liu
Iain Beehuat Tan
Soon Thye Lim
Eyal Klang
Tony Kiat Hon Lim
Rick Siow Mong Goh
Yong Liu
Daniel Shu Wei Ting
Akses Cepat
- Tahun Terbit
- 2025
- Bahasa
- en
- Sumber Database
- arXiv
- Akses
- Open Access ✓