arXiv Open Access 2025

DentVLM: A Multimodal Vision-Language Model for Comprehensive Dental Diagnosis and Enhanced Clinical Practice

Zijie Meng Jin Hao Xiwei Dai Yang Feng Jiaxiang Liu +18 lainnya

Lihat Sumber

Abstrak

Diagnosing and managing oral diseases necessitate advanced visual interpretation across diverse imaging modalities and integrated information synthesis. While current AI models excel at isolated tasks, they often fall short in addressing the complex, multimodal requirements of comprehensive clinical dental practice. Here we introduce DentVLM, a multimodal vision-language model engineered for expert-level oral disease diagnosis. DentVLM was developed using a comprehensive, large-scale, bilingual dataset of 110,447 images and 2.46 million visual question-answering (VQA) pairs. The model is capable of interpreting seven 2D oral imaging modalities across 36 diagnostic tasks, significantly outperforming leading proprietary and open-source models by 19.6% higher accuracy for oral diseases and 27.9% for malocclusions. In a clinical study involving 25 dentists, evaluating 1,946 patients and encompassing 3,105 QA pairs, DentVLM surpassed the diagnostic performance of 13 junior dentists on 21 of 36 tasks and exceeded that of 12 senior dentists on 12 of 36 tasks. When integrated into a collaborative workflow, DentVLM elevated junior dentists' performance to senior levels and reduced diagnostic time for all practitioners by 15-22%. Furthermore, DentVLM exhibited promising performance across three practical utility scenarios, including home-based dental health management, hospital-based intelligent diagnosis and multi-agent collaborative interaction. These findings establish DentVLM as a robust clinical decision support tool, poised to enhance primary dental care, mitigate provider-patient imbalances, and democratize access to specialized medical expertise within the field of dentistry.

Topik & Kata Kunci

cs.CV cs.AI

Penulis (23)

Zijie Meng

Jin Hao

Xiwei Dai

Yang Feng

Jiaxiang Liu

Bin Feng

Huikai Wu

Xiaotang Gai

Hengchuan Zhu

Tianxiang Hu

Yangyang Wu

Hongxia Xu

Jin Li

Jun Xiao

Xiaoqiang Liu

Joey Tianyi Zhou

Fudong Zhu

Zhihe Zhao

Lunguo Xia

Bing Fang

Jimeng Sun

Jian Wu

Zuozhu Liu

Format Sitasi

APA MLA BibTeX

Meng, Z., Hao, J., Dai, X., Feng, Y., Liu, J., Feng, B. et al. (2025). DentVLM: A Multimodal Vision-Language Model for Comprehensive Dental Diagnosis and Enhanced Clinical Practice. https://arxiv.org/abs/2509.23344

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓