arXiv Open Access 2025

NorEval: A Norwegian Language Understanding and Generation Evaluation Benchmark

Vladislav Mikhailov Tita Enstad David Samuel Hans Christian Farsethås Andrey Kutuzov +2 lainnya
Lihat Sumber

Abstrak

This paper introduces NorEval, a new and comprehensive evaluation suite for large-scale standardized benchmarking of Norwegian generative language models (LMs). NorEval consists of 24 high-quality human-created datasets -- of which five are created from scratch. In contrast to existing benchmarks for Norwegian, NorEval covers a broad spectrum of task categories targeting Norwegian language understanding and generation, establishes human baselines, and focuses on both of the official written standards of the Norwegian language: Bokmål and Nynorsk. All our datasets and a collection of over 100 human-written prompts are integrated into LM Evaluation Harness, ensuring flexible and reproducible evaluation. We describe the NorEval design and present the results of benchmarking 19 open-source pre-trained and instruction-tuned LMs for Norwegian in various scenarios. Our benchmark, evaluation framework, and annotation materials are publicly available.

Topik & Kata Kunci

Penulis (7)

V

Vladislav Mikhailov

T

Tita Enstad

D

David Samuel

H

Hans Christian Farsethås

A

Andrey Kutuzov

E

Erik Velldal

L

Lilja Øvrelid

Format Sitasi

Mikhailov, V., Enstad, T., Samuel, D., Farsethås, H.C., Kutuzov, A., Velldal, E. et al. (2025). NorEval: A Norwegian Language Understanding and Generation Evaluation Benchmark. https://arxiv.org/abs/2504.07749

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓