arXiv Open Access 2025

Multi-label Scandinavian Language Identification (SLIDE)

Mariia Fedorova Jonas Sebulon Frydenberg Victoria Handford Victoria Ovedie Chruickshank Langø Solveig Helene Willoch +4 lainnya
Lihat Sumber

Abstrak

Identifying closely related languages at sentence level is difficult, in particular because it is often impossible to assign a sentence to a single language. In this paper, we focus on multi-label sentence-level Scandinavian language identification (LID) for Danish, Norwegian Bokmål, Norwegian Nynorsk, and Swedish. We present the Scandinavian Language Identification and Evaluation, SLIDE, a manually curated multi-label evaluation dataset and a suite of LID models with varying speed-accuracy tradeoffs. We demonstrate that the ability to identify multiple languages simultaneously is necessary for any accurate LID method, and present a novel approach to training such multi-label LID models.

Topik & Kata Kunci

Penulis (9)

M

Mariia Fedorova

J

Jonas Sebulon Frydenberg

V

Victoria Handford

V

Victoria Ovedie Chruickshank Langø

S

Solveig Helene Willoch

M

Marthe Løken Midtgaard

Y

Yves Scherrer

P

Petter Mæhlum

D

David Samuel

Format Sitasi

Fedorova, M., Frydenberg, J.S., Handford, V., Langø, V.O.C., Willoch, S.H., Midtgaard, M.L. et al. (2025). Multi-label Scandinavian Language Identification (SLIDE). https://arxiv.org/abs/2502.06692

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓