arXiv Open Access 2024

FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models

Wei Li Ren Ma Jiang Wu Chenya Gu Jiahui Peng +5 lainnya

Lihat Sumber

Abstrak

In the burgeoning field of large language models (LLMs), the assessment of fundamental knowledge remains a critical challenge, particularly for models tailored to Chinese language and culture. This paper introduces FoundaBench, a pioneering benchmark designed to rigorously evaluate the fundamental knowledge capabilities of Chinese LLMs. FoundaBench encompasses a diverse array of 3354 multiple-choice questions across common sense and K-12 educational subjects, meticulously curated to reflect the breadth and depth of everyday and academic knowledge. We present an extensive evaluation of 12 state-of-the-art LLMs using FoundaBench, employing both traditional assessment methods and our CircularEval protocol to mitigate potential biases in model responses. Our results highlight the superior performance of models pre-trained on Chinese corpora, and reveal a significant disparity between models' reasoning and memory recall capabilities. The insights gleaned from FoundaBench evaluations set a new standard for understanding the fundamental knowledge of LLMs, providing a robust framework for future advancements in the field.

Topik & Kata Kunci

cs.CL cs.AI

Penulis (10)

Wei Li

Ren Ma

Jiang Wu

Chenya Gu

Jiahui Peng

Jinyang Len

Songyang Zhang

Hang Yan

Dahua Lin

Conghui He

Format Sitasi

APA MLA BibTeX

Li, W., Ma, R., Wu, J., Gu, C., Peng, J., Len, J. et al. (2024). FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models. https://arxiv.org/abs/2404.18359

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓