arXiv Open Access 2025

A Collection of Question Answering Datasets for Norwegian

Vladislav Mikhailov Petter Mæhlum Victoria Ovedie Chruickshank Langø Erik Velldal Lilja Øvrelid

Lihat Sumber

Abstrak

This paper introduces a new suite of question answering datasets for Norwegian; NorOpenBookQA, NorCommonSenseQA, NorTruthfulQA, and NRK-Quiz-QA. The data covers a wide range of skills and knowledge domains, including world knowledge, commonsense reasoning, truthfulness, and knowledge about Norway. Covering both of the written standards of Norwegian - Bokmål and Nynorsk - our datasets comprise over 10k question-answer pairs, created by native speakers. We detail our dataset creation approach and present the results of evaluating 11 language models (LMs) in zero- and few-shot regimes. Most LMs perform better in Bokmål than Nynorsk, struggle most with commonsense reasoning, and are often untruthful in generating answers to questions. All our datasets and annotation materials are publicly available.

Topik & Kata Kunci

cs.CL cs.AI

Penulis (5)

Vladislav Mikhailov

Petter Mæhlum

Victoria Ovedie Chruickshank Langø

Erik Velldal

Lilja Øvrelid

Format Sitasi

APA MLA BibTeX

Mikhailov, V., Mæhlum, P., Langø, V.O.C., Velldal, E., Øvrelid, L. (2025). A Collection of Question Answering Datasets for Norwegian. https://arxiv.org/abs/2501.11128

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓