DOAJ Open Access 2026

Accuracy and reliability of Manus, ChatGPT, and Claude in case-based dental diagnosis

Ahmed A. Madfa Abdullah F. Alshammari Bassam A. Anazi Yousef E. Alenezi Khlood A. Alkurdi +1 lainnya

Abstrak

IntroductionArtificial intelligence (AI), particularly large language models (LLMs), is transforming healthcare education and clinical decision-making. While models like ChatGPT and Claude have demonstrated utility in medical contexts, their performance in dental diagnostics remains underexplored; additionally, the potential of emerging platforms, like Manus, is yet to be evaluated.ObjectiveTo compare the diagnostic accuracy and consistency of the ChatGPT, Claude, and Manus—using authentic, case-based dental scenarios.MethodsA set of 117 multiple-choice questions based on validated clinical dental vignettes spanning various specialities was administered to each model under standardised conditions at two separate time points. Responses were scored against expert-validated answer keys. Inter-rater reliability was assessed using Cohen's kappa, and statistical comparisons were made using the chi-square, McNemar, and t-tests.ResultsClaude and Manus consistently outperformed ChatGPT across both testing phases. In the second round, Claude and Manus achieved a diagnostic accuracy of 92.3%, compared to ChatGPT's 76.9%. Claude and Manus also demonstrated higher intra-model consistency (Cohen's kappa = 0.714 and 0.782, respectively) than ChatGPT (kappa = 0.560). Although the numerical trends favoured Claude and Manus, pairwise differences in accuracy did not reach statistical significance.ConclusionClaude and Manus demonstrated numerically higher diagnostic performance and greater response stability compared with ChatGPT; however, these differences did not reach statistical significance and should therefore be interpreted cautiously. This variability across models highlights the need for larger-scale evaluations. These findings underscore the importance of considering both accuracy and consistency when selecting AI tools for integration into dental practice and curricula.

Topik & Kata Kunci

Penulis (6)

A

Ahmed A. Madfa

A

Abdullah F. Alshammari

B

Bassam A. Anazi

Y

Yousef E. Alenezi

K

Khlood A. Alkurdi

K

Khlood A. Alkurdi

Format Sitasi

Madfa, A.A., Alshammari, A.F., Anazi, B.A., Alenezi, Y.E., Alkurdi, K.A., Alkurdi, K.A. (2026). Accuracy and reliability of Manus, ChatGPT, and Claude in case-based dental diagnosis. https://doi.org/10.3389/froh.2025.1686090

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.3389/froh.2025.1686090
Informasi Jurnal
Tahun Terbit
2026
Sumber Database
DOAJ
DOI
10.3389/froh.2025.1686090
Akses
Open Access ✓