arXiv Open Access 2026

Deliberative multi-agent large language models improve clinical reasoning in ophthalmology

Ehsan Misaghi Sean T Berkowitz Bing Yu Chen Qingyu Chen Renaud Duval +8 lainnya
Lihat Sumber

Abstrak

Large language models (LLMs) show potential for ophthalmic clinical reasoning, yet individual models risk introducing harm. We evaluated whether multi-agent LLM deliberative councils improve diagnostic performance and mitigate harm compared to individual LLMs. In a comparative cross-sectional study, we assessed 12 individual LLMs and three multi-agent councils on 100 ophthalmology clinical vignettes. Each council comprised four models assembled by type: proprietary flagship, proprietary fast, and open-source. Models independently answered a vignette, anonymously ranked one another's responses, and a designated chair synthesized all responses and peer reviews into a final answer. Councils consistently outperformed pooled individual models across all three tiers. Accuracy improved for proprietary flagship (95.0% vs 90.8%; risk difference [RD]: 4.25 [95% CI: 0.45, 8.05]), proprietary fast (96.0% vs 86.5%; RD: 9.50 [5.31, 13.59]), and open-source councils (91.0% vs 83.2%; RD: 7.75 [4.17, 11.33]). Harm rates declined for proprietary flagship (10.0% vs 22.5%; RD: -12.50 [-16.86, -8.14]), proprietary fast (16.0% vs 31.8%; RD: -15.75 [-21.49, -10.01]), and open-source councils (22.0% vs 38.5%; RD: -16.50 [-22.27, -10.73]). Coverage analysis revealed net positive gains for accuracy (ΔCoverage: 4.4-9.8 percentage points) and safety (ΔCoverage: 13.6-20.6), indicating councils recovered correct diagnoses and averted harm. Councils elevated correct diagnoses to higher rank positions; and produced more complete differentials and management plans (all P<.05). Harmful council responses showed reduced combined commission-and-omission errors and tended to be less severe. Structured deliberation via multi-agent LLM councils may enhance the reliability of LLM-assisted ophthalmic clinical reasoning.

Topik & Kata Kunci

Penulis (13)

E

Ehsan Misaghi

S

Sean T Berkowitz

B

Bing Yu Chen

Q

Qingyu Chen

R

Renaud Duval

P

Pearse A Keane

D

Danny A Mammo

A

Ariel Yuhan Ong

M

Mertcan Sevgi

S

Sumit Sharma

S

Sunil K Srivastava

Y

Yih Chung Tham

F

Fares Antaki

Format Sitasi

Misaghi, E., Berkowitz, S.T., Chen, B.Y., Chen, Q., Duval, R., Keane, P.A. et al. (2026). Deliberative multi-agent large language models improve clinical reasoning in ophthalmology. https://arxiv.org/abs/2603.21447

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2026
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓