arXiv Open Access 2024

Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine

Qiao Jin Fangyuan Chen Yiliang Zhou Ziyang Xu Justin M. Cheung +13 lainnya
Lihat Sumber

Abstrak

Recent studies indicate that Generative Pre-trained Transformer 4 with Vision (GPT-4V) outperforms human physicians in medical challenge tasks. However, these evaluations primarily focused on the accuracy of multi-choice questions alone. Our study extends the current scope by conducting a comprehensive analysis of GPT-4V's rationales of image comprehension, recall of medical knowledge, and step-by-step multimodal reasoning when solving New England Journal of Medicine (NEJM) Image Challenges - an imaging quiz designed to test the knowledge and diagnostic capabilities of medical professionals. Evaluation results confirmed that GPT-4V performs comparatively to human physicians regarding multi-choice accuracy (81.6% vs. 77.8%). GPT-4V also performs well in cases where physicians incorrectly answer, with over 78% accuracy. However, we discovered that GPT-4V frequently presents flawed rationales in cases where it makes the correct final choices (35.5%), most prominent in image comprehension (27.2%). Regardless of GPT-4V's high accuracy in multi-choice questions, our findings emphasize the necessity for further in-depth evaluations of its rationales before integrating such multimodal AI models into clinical workflows.

Topik & Kata Kunci

Penulis (18)

Q

Qiao Jin

F

Fangyuan Chen

Y

Yiliang Zhou

Z

Ziyang Xu

J

Justin M. Cheung

R

Robert Chen

R

Ronald M. Summers

J

Justin F. Rousseau

P

Peiyun Ni

M

Marc J Landsman

S

Sally L. Baxter

S

Subhi J. Al'Aref

Y

Yijia Li

A

Alex Chen

J

Josef A. Brejt

M

Michael F. Chiang

Y

Yifan Peng

Z

Zhiyong Lu

Format Sitasi

Jin, Q., Chen, F., Zhou, Y., Xu, Z., Cheung, J.M., Chen, R. et al. (2024). Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine. https://arxiv.org/abs/2401.08396

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓