arXiv Open Access 2025

MedEthicsQA: A Comprehensive Question Answering Benchmark for Medical Ethics Evaluation of LLMs

Jianhui Wei Zijie Meng Zikai Xiao Tianxiang Hu Yang Feng +3 lainnya
Lihat Sumber

Abstrak

While Medical Large Language Models (MedLLMs) have demonstrated remarkable potential in clinical tasks, their ethical safety remains insufficiently explored. This paper introduces $\textbf{MedEthicsQA}$, a comprehensive benchmark comprising $\textbf{5,623}$ multiple-choice questions and $\textbf{5,351}$ open-ended questions for evaluation of medical ethics in LLMs. We systematically establish a hierarchical taxonomy integrating global medical ethical standards. The benchmark encompasses widely used medical datasets, authoritative question banks, and scenarios derived from PubMed literature. Rigorous quality control involving multi-stage filtering and multi-faceted expert validation ensures the reliability of the dataset with a low error rate ($2.72\%$). Evaluation of state-of-the-art MedLLMs exhibit declined performance in answering medical ethics questions compared to their foundation counterparts, elucidating the deficiencies of medical ethics alignment. The dataset, registered under CC BY-NC 4.0 license, is available at https://github.com/JianhuiWei7/MedEthicsQA.

Topik & Kata Kunci

Penulis (8)

J

Jianhui Wei

Z

Zijie Meng

Z

Zikai Xiao

T

Tianxiang Hu

Y

Yang Feng

Z

Zhijie Zhou

J

Jian Wu

Z

Zuozhu Liu

Format Sitasi

Wei, J., Meng, Z., Xiao, Z., Hu, T., Feng, Y., Zhou, Z. et al. (2025). MedEthicsQA: A Comprehensive Question Answering Benchmark for Medical Ethics Evaluation of LLMs. https://arxiv.org/abs/2506.22808

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓