arXiv Open Access 2023

NoCoLA: The Norwegian Corpus of Linguistic Acceptability

Matias Jentoft David Samuel
Lihat Sumber

Abstrak

While there has been a surge of large language models for Norwegian in recent years, we lack any tool to evaluate their understanding of grammaticality. We present two new Norwegian datasets for this task. NoCoLA_class is a supervised binary classification task where the goal is to discriminate between acceptable and non-acceptable sentences. On the other hand, NoCoLA_zero is a purely diagnostic task for evaluating the grammatical judgement of a language model in a completely zero-shot manner, i.e. without any further training. In this paper, we describe both datasets in detail, show how to use them for different flavors of language models, and conduct a comparative study of the existing Norwegian language models.

Topik & Kata Kunci

Penulis (2)

M

Matias Jentoft

D

David Samuel

Format Sitasi

Jentoft, M., Samuel, D. (2023). NoCoLA: The Norwegian Corpus of Linguistic Acceptability. https://arxiv.org/abs/2306.07790

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓