arXiv Open Access 2024

FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models

Wei Li Ren Ma Jiang Wu Chenya Gu Jiahui Peng +5 lainnya
Lihat Sumber

Abstrak

In the burgeoning field of large language models (LLMs), the assessment of fundamental knowledge remains a critical challenge, particularly for models tailored to Chinese language and culture. This paper introduces FoundaBench, a pioneering benchmark designed to rigorously evaluate the fundamental knowledge capabilities of Chinese LLMs. FoundaBench encompasses a diverse array of 3354 multiple-choice questions across common sense and K-12 educational subjects, meticulously curated to reflect the breadth and depth of everyday and academic knowledge. We present an extensive evaluation of 12 state-of-the-art LLMs using FoundaBench, employing both traditional assessment methods and our CircularEval protocol to mitigate potential biases in model responses. Our results highlight the superior performance of models pre-trained on Chinese corpora, and reveal a significant disparity between models' reasoning and memory recall capabilities. The insights gleaned from FoundaBench evaluations set a new standard for understanding the fundamental knowledge of LLMs, providing a robust framework for future advancements in the field.

Topik & Kata Kunci

Penulis (10)

W

Wei Li

R

Ren Ma

J

Jiang Wu

C

Chenya Gu

J

Jiahui Peng

J

Jinyang Len

S

Songyang Zhang

H

Hang Yan

D

Dahua Lin

C

Conghui He

Format Sitasi

Li, W., Ma, R., Wu, J., Gu, C., Peng, J., Len, J. et al. (2024). FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models. https://arxiv.org/abs/2404.18359

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓