arXiv Open Access 2025

Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles

Samia Touileb Vladislav Mikhailov Marie Kroka Lilja Øvrelid Erik Velldal

Lihat Sumber

Abstrak

We introduce a dataset of high-quality human-authored summaries of news articles in Norwegian. The dataset is intended for benchmarking the abstractive summarisation capabilities of generative language models. Each document in the dataset is provided with three different candidate gold-standard summaries written by native Norwegian speakers, and all summaries are provided in both of the written variants of Norwegian -- Bokmål and Nynorsk. The paper describes details on the data creation effort as well as an evaluation of existing open LLMs for Norwegian on the dataset. We also provide insights from a manual human evaluation, comparing human-authored to model-generated summaries. Our results indicate that the dataset provides a challenging LLM benchmark for Norwegian summarisation capabilities

Topik & Kata Kunci

cs.CL

Penulis (5)

Samia Touileb

Vladislav Mikhailov

Marie Kroka

Lilja Øvrelid

Erik Velldal

Format Sitasi

APA MLA BibTeX

Touileb, S., Mikhailov, V., Kroka, M., Øvrelid, L., Velldal, E. (2025). Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles. https://arxiv.org/abs/2501.07718

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓