Comparative Evaluation of Vision–Language Models for Detecting and Localizing Dental Lesions from Intraoral Images
Abstrak
To assess the efficiency of vision–language models in detecting and classifying carious and non-carious lesions from intraoral photo imaging. A dataset of 172 annotated images were classified for microcavitation, cavitated lesions, staining, calculus, and non-carious lesions. Florence-2, PaLI-Gemma, and YOLOv8 models were trained on the dataset and model performance. The dataset was divided into 80:10:10 split, and the model performance was evaluated using mean average precision (mAP), mAP50-95, class-specific precision and recall. YOLOv8 outperformed the vision–language models, achieving a mean average precision (mAP) of 37% with a precision of 42.3% (with 100% for cavitation detection) and 31.3% recall. PaLI-Gemma produced a recall of 13% and 21%. Florence-2 yielded a mean average precision of 10% with a precision and recall was 51% and 35%. YOLOv8 achieved the strongest overall performance. Florence-2 and PaLI-Gemma models underperformed relative to YOLOv8 despite the potential for multimodal contextual understanding, highlighting the need for larger, more diverse datasets and hybrid architectures to achieve improved performance.
Topik & Kata Kunci
Penulis (7)
Maria Jahan
Al Ibne Siam
Lamim Zakir Pronay
Saif Ahmed
Nabeel Mohammed
James Dudley
Taseef Hasan Farook
Akses Cepat
- Tahun Terbit
- 2026
- Sumber Database
- DOAJ
- DOI
- 10.3390/jimaging12010022
- Akses
- Open Access ✓