Challenges of using generative artificial intelligence for diabetes patient education: a cross-platform analysis of the quality, readability, and actionability of text generated by large language models
Abstrak
ObjectiveTo compare, across large language model (LLM) platforms, the quality, readability, and completeness of action-oriented instructions in diabetes self-management education texts, and to quantify the associations among these domains to inform model selection and risk mitigation.MethodsTen LLM platforms were used to generate diabetes education texts (total n = 200), stratified by topic. Outcomes included the Global Quality Score (GQS), the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P), and EQIP-36 (Ensuring Quality Information for Patients, 36-item version). Text characteristics, including word count, sentence count, and syllable count, were recorded. Readability was assessed using the Automated Readability Index (ARI), Coleman–Liau Index (CLI), Flesch–Kincaid Grade Level (FKGL), Flesch Reading Ease Score (FRES), Gunning Fog Index (GFOG), Linsear Write (LW), and the Simple Measure of Gobbledygook (SMOG). Between-platform differences were evaluated using one-way ANOVA or the Kruskal–Wallis test, as appropriate. Associations between readability indices and GQS, PEMAT-P, and EQIP-36 were examined using correlation heat maps and exploratory stepwise multiple linear regression. Because the readability indices were highly intercorrelated, these regression analyses were considered exploratory and were used to identify candidate readability-related correlates rather than definitive independent predictors.ResultsGQS and PEMAT-P differed significantly across platforms (both p < 0.001), whereas EQIP-36 did not (p = 0.062). Text length and readability also varied by platform (most p < 0.001). After stratification by topic, PEMAT-P understandability, PEMAT-P total score, and GQS no longer differed significantly across topics (p = 0.356, p = 0.247, and p = 0.182, respectively), whereas PEMAT-P actionability (p < 0.001), EQIP-36 (p < 0.001), and several readability metrics remained significantly different. Difficulty indices were strongly intercorrelated, and FRES was inversely associated with multiple difficulty indices. Exploratory regression analyses suggested that greater reading burden tended to co-occur with lower GQS, PEMAT-P, and EQIP-36 scores.ConclusionLLM-generated diabetes education texts exhibit marked cross-platform heterogeneity, and exploratory analyses suggest a potential trade-off between readability and both information quality and the completeness of action-oriented instructions. Clinical implementation should therefore combine careful platform selection, structured prompting with templates, human–AI review, and continuous quality monitoring to support safe, readable, and actionable patient education.
Topik & Kata Kunci
Penulis (8)
Zhiqiang Wang
Zhiqiang Wang
Xiaoya Li
Xianglan Tao
Jie Li
Li Zhang
Xiaorong He
Jing Yang
Format Sitasi
Akses Cepat
- Tahun Terbit
- 2026
- Sumber Database
- DOAJ
- DOI
- 10.3389/fpubh.2026.1804524
- Akses
- Open Access ✓