arXiv Open Access 2025

NorEval: A Norwegian Language Understanding and Generation Evaluation Benchmark

Vladislav Mikhailov Tita Enstad David Samuel Hans Christian Farsethås Andrey Kutuzov +2 lainnya

Lihat Sumber

Abstrak

This paper introduces NorEval, a new and comprehensive evaluation suite for large-scale standardized benchmarking of Norwegian generative language models (LMs). NorEval consists of 24 high-quality human-created datasets -- of which five are created from scratch. In contrast to existing benchmarks for Norwegian, NorEval covers a broad spectrum of task categories targeting Norwegian language understanding and generation, establishes human baselines, and focuses on both of the official written standards of the Norwegian language: Bokmål and Nynorsk. All our datasets and a collection of over 100 human-written prompts are integrated into LM Evaluation Harness, ensuring flexible and reproducible evaluation. We describe the NorEval design and present the results of benchmarking 19 open-source pre-trained and instruction-tuned LMs for Norwegian in various scenarios. Our benchmark, evaluation framework, and annotation materials are publicly available.

Topik & Kata Kunci

cs.CL cs.AI

Penulis (7)

Vladislav Mikhailov

Tita Enstad

David Samuel

Hans Christian Farsethås

Andrey Kutuzov

Erik Velldal

Lilja Øvrelid

Format Sitasi

APA MLA BibTeX

Mikhailov, V., Enstad, T., Samuel, D., Farsethås, H.C., Kutuzov, A., Velldal, E. et al. (2025). NorEval: A Norwegian Language Understanding and Generation Evaluation Benchmark. https://arxiv.org/abs/2504.07749

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓