arXiv Open Access 2023

M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark for Chinese Large Language Models

Chuang Liu Renren Jin Yuqi Ren Linhao Yu Tianyu Dong +8 lainnya

Lihat Sumber

Abstrak

Large language models have recently made tremendous progress in a variety of aspects, e.g., cross-task generalization, instruction following. Comprehensively evaluating the capability of large language models in multiple tasks is of great importance. In this paper, we propose M3KE, a Massive Multi-Level Multi-Subject Knowledge Evaluation benchmark, which is developed to measure knowledge acquired by Chinese large language models by testing their multitask accuracy in zero- and few-shot settings. We have collected 20,477 questions from 71 tasks. Our selection covers all major levels of Chinese education system, ranging from the primary school to college, as well as a wide variety of subjects, including humanities, history, politics, law, education, psychology, science, technology, art and religion. All questions are multiple-choice questions with four options, hence guaranteeing a standardized and unified assessment process. We've assessed a number of state-of-the-art open-source Chinese large language models on the proposed benchmark. The size of these models varies from 335M to 130B parameters. Experiment results demonstrate that they perform significantly worse than GPT-3.5 that reaches an accuracy of ~ 48% on M3KE. The dataset is available at https://github.com/tjunlp-lab/M3KE.

Topik & Kata Kunci

cs.CL

Penulis (13)

Chuang Liu

Renren Jin

Yuqi Ren

Linhao Yu

Tianyu Dong

Xiaohan Peng

Shuting Zhang

Jianxiang Peng

Peiyi Zhang

Qingqing Lyu

Xiaowen Su

Qun Liu

Deyi Xiong

Format Sitasi

APA MLA BibTeX

Liu, C., Jin, R., Ren, Y., Yu, L., Dong, T., Peng, X. et al. (2023). M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark for Chinese Large Language Models. https://arxiv.org/abs/2305.10263

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2023
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓