Semantic Scholar Open Access 2022 14 sitasi

Measuring Harmful Representations in Scandinavian Language Models

Samia Touileb Debora Nozza

Lihat Sumber DOI

Abstrak

Scandinavian countries are perceived as role-models when it comes to gender equality. With the advent of pre-trained language models and their widespread usage, we investigate to what extent gender-based harmful and toxic content exists in selected Scandinavian language models. We examine nine models, covering Danish, Swedish, and Norwegian, by manually creating template-based sentences and probing the models for completion. We evaluate the completions using two methods for measuring harmful and toxic completions and provide a thorough analysis of the results. We show that Scandinavian pre-trained language models contain harmful and gender-based stereotypes with similar values across all languages. This finding goes against the general expectations related to gender equality in Scandinavian countries and shows the possible problematic outcomes of using such models in real-world settings. Warning: Some of the examples provided in this paper can be upsetting and offensive.

Topik & Kata Kunci

Computer Science

Penulis (2)

Samia Touileb

Debora Nozza

Format Sitasi

APA MLA BibTeX

Touileb, S., Nozza, D. (2022). Measuring Harmful Representations in Scandinavian Language Models. https://doi.org/10.48550/arXiv.2211.11678

Akses Cepat

Lihat di Sumber doi.org/10.48550/arXiv.2211.11678

Informasi Jurnal

Tahun Terbit: 2022
Bahasa: en
Total Sitasi: 14×
Sumber Database: Semantic Scholar
DOI: 10.48550/arXiv.2211.11678
Akses: Open Access ✓