Semantic Scholar Open Access 2023 461 sitasi

CMMLU: Measuring massive multitask language understanding in Chinese

Haonan Li Yixuan Zhang Fajri Koto Yifei Yang Hai Zhao +3 lainnya

Lihat Sumber DOI

Abstrak

As the capabilities of large language models (LLMs) continue to advance, evaluating their performance becomes increasingly crucial and challenging. This paper aims to bridge this gap by introducing CMMLU, a comprehensive Chinese benchmark that covers various subjects, including natural science, social sciences, engineering, and humanities. We conduct a thorough evaluation of 18 advanced multilingual- and Chinese-oriented LLMs, assessing their performance across different subjects and settings. The results reveal that most existing LLMs struggle to achieve an average accuracy of 50%, even when provided with in-context examples and chain-of-thought prompts, whereas the random baseline stands at 25%. This highlights significant room for improvement in LLMs. Additionally, we conduct extensive experiments to identify factors impacting the models' performance and propose directions for enhancing LLMs. CMMLU fills the gap in evaluating the knowledge and reasoning capabilities of large language models within the Chinese context.

Topik & Kata Kunci

Computer Science

Penulis (8)

Haonan Li

Yixuan Zhang

Fajri Koto

Yifei Yang

Hai Zhao

Yeyun Gong

Nan Duan

Tim Baldwin

Format Sitasi

APA MLA BibTeX

Li, H., Zhang, Y., Koto, F., Yang, Y., Zhao, H., Gong, Y. et al. (2023). CMMLU: Measuring massive multitask language understanding in Chinese. https://doi.org/10.48550/arXiv.2306.09212

Akses Cepat

Lihat di Sumber doi.org/10.48550/arXiv.2306.09212

Informasi Jurnal

Tahun Terbit: 2023
Bahasa: en
Total Sitasi: 461×
Sumber Database: Semantic Scholar
DOI: 10.48550/arXiv.2306.09212
Akses: Open Access ✓