arXiv Open Access 2026

EuraGovExam: A Multilingual Multimodal Benchmark from Real-World Civil Service Exams

JaeSeong Kim Chaehwan Lim Sang Hyun Gil Suan Lee
Lihat Sumber

Abstrak

We present EuraGovExam, a multilingual and multimodal benchmark sourced from real-world civil service examinations across five representative Eurasian regions: South Korea, Japan, Taiwan, India, and the European Union. Designed to reflect the authentic complexity of public-sector assessments, the dataset contains over 8,000 high-resolution scanned multiple-choice questions covering 17 diverse academic and administrative domains. Unlike existing benchmarks, EuraGovExam embeds all question content--including problem statements, answer choices, and visual elements--within a single image, providing only a minimal standardized instruction for answer formatting. This design demands that models perform layout-aware, cross-lingual reasoning directly from visual input. All items are drawn from real exam documents, preserving rich visual structures such as tables, multilingual typography, and form-like layouts. Evaluation results show that even state-of-the-art vision-language models (VLMs) achieve only 86% accuracy, underscoring the benchmark's difficulty and its power to diagnose the limitations of current models. By emphasizing cultural realism, visual complexity, and linguistic diversity, EuraGovExam establishes a new standard for evaluating VLMs in high-stakes, multilingual, image-grounded settings. It also supports practical applications in e-governance, public-sector document analysis, and equitable exam preparation.

Topik & Kata Kunci

Penulis (4)

J

JaeSeong Kim

C

Chaehwan Lim

S

Sang Hyun Gil

S

Suan Lee

Format Sitasi

Kim, J., Lim, C., Gil, S.H., Lee, S. (2026). EuraGovExam: A Multilingual Multimodal Benchmark from Real-World Civil Service Exams. https://arxiv.org/abs/2603.27223

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2026
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓