DOAJ Open Access 2025

Large language models’ performances regarding common patient questions about osteoarthritis: A comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Perplexity

Mingde Cao Qianwen Wang Xueyou Zhang Zuru Liang Jihong Qiu +2 lainnya

Abstrak

Background: Large Language Models (LLMs) have gained much attention and, in part, have replaced common search engines as a popular channel for obtaining information due to their contextually relevant responses. Osteoarthritis (OA) is a common topic in skeletal muscle disorders, and patients often seek information about it online. Our study evaluated the ability of 3 LLMs (ChatGPT-3.5, ChatGPT-4.0, and Perplexity) to accurately answer common OA-related queries. Methods: We defined 6 themes (pathogenesis, risk factors, clinical presentation, diagnosis, treatment and prevention, and prognosis) based on a generalization of 25 frequently asked questions about OA. Three consultant-level orthopedic specialists independently rated the LLMs' replies on a 4-point accuracy scale. The final ratings for each response were determined using a majority consensus approach. Responses classified as “satisfactory” were evaluated for comprehensiveness on a 5-point scale. Results: ChatGPT-4.0 demonstrated superior accuracy, with 64% of responses rated as “excellent”, compared to 40% for ChatGPT-3.5 and 28% for Perplexity (Pearson's χ2 test with Fisher's exact test, all p < 0.001). All 3 LLM-chatbots had high mean comprehensiveness ratings (Perplexity = 3.88; ChatGPT-4.0 = 4.56; ChatGPT-3.5 = 3.96, out of a maximum score of 5). The LLM-chatbots performed reliably across domains, except for “treatment and prevention” However, ChatGPT-4.0 still outperformed ChatGPT-3.5 and Perplexity, garnering 53.8% “excellent” ratings (Pearson's χ2 test with Fisher's exact test, all p < 0.001). Conclusion: Our findings underscore the potential of LLMs, specifically ChatGPT-4.0 and Perplexity, to deliver accurate and thorough responses to OA-related queries. Targeted correction of specific misconceptions to improve the accuracy of LLMs remains crucial.

Topik & Kata Kunci

Penulis (7)

M

Mingde Cao

Q

Qianwen Wang

X

Xueyou Zhang

Z

Zuru Liang

J

Jihong Qiu

P

Patrick Shu-Hang Yung

M

Michael Tim-Yun Ong

Format Sitasi

Cao, M., Wang, Q., Zhang, X., Liang, Z., Qiu, J., Yung, P.S. et al. (2025). Large language models’ performances regarding common patient questions about osteoarthritis: A comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Perplexity. https://doi.org/10.1016/j.jshs.2024.101016

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.1016/j.jshs.2024.101016
Informasi Jurnal
Tahun Terbit
2025
Sumber Database
DOAJ
DOI
10.1016/j.jshs.2024.101016
Akses
Open Access ✓