arXiv Open Access 2025

MedEthicsQA: A Comprehensive Question Answering Benchmark for Medical Ethics Evaluation of LLMs

Jianhui Wei Zijie Meng Zikai Xiao Tianxiang Hu Yang Feng +3 lainnya

Lihat Sumber

Abstrak

While Medical Large Language Models (MedLLMs) have demonstrated remarkable potential in clinical tasks, their ethical safety remains insufficiently explored. This paper introduces $\textbf{MedEthicsQA}$, a comprehensive benchmark comprising $\textbf{5,623}$ multiple-choice questions and $\textbf{5,351}$ open-ended questions for evaluation of medical ethics in LLMs. We systematically establish a hierarchical taxonomy integrating global medical ethical standards. The benchmark encompasses widely used medical datasets, authoritative question banks, and scenarios derived from PubMed literature. Rigorous quality control involving multi-stage filtering and multi-faceted expert validation ensures the reliability of the dataset with a low error rate ($2.72\%$). Evaluation of state-of-the-art MedLLMs exhibit declined performance in answering medical ethics questions compared to their foundation counterparts, elucidating the deficiencies of medical ethics alignment. The dataset, registered under CC BY-NC 4.0 license, is available at https://github.com/JianhuiWei7/MedEthicsQA.

Topik & Kata Kunci

cs.CL cs.AI

Penulis (8)

Jianhui Wei

Zijie Meng

Zikai Xiao

Tianxiang Hu

Yang Feng

Zhijie Zhou

Jian Wu

Zuozhu Liu

Format Sitasi

APA MLA BibTeX

Wei, J., Meng, Z., Xiao, Z., Hu, T., Feng, Y., Zhou, Z. et al. (2025). MedEthicsQA: A Comprehensive Question Answering Benchmark for Medical Ethics Evaluation of LLMs. https://arxiv.org/abs/2506.22808

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓