arXiv Open Access 2023

Systematic Offensive Stereotyping (SOS) Bias in Language Models

Fatma Elsafoury

Lihat Sumber

Abstrak

In this paper, we propose a new metric to measure the SOS bias in language models (LMs). Then, we validate the SOS bias and investigate the effectiveness of removing it. Finally, we investigate the impact of the SOS bias in LMs on their performance and fairness on hate speech detection. Our results suggest that all the inspected LMs are SOS biased. And that the SOS bias is reflective of the online hate experienced by marginalized identities. The results indicate that using debias methods from the literature worsens the SOS bias in LMs for some sensitive attributes and improves it for others. Finally, Our results suggest that the SOS bias in the inspected LMs has an impact on their fairness of hate speech detection. However, there is no strong evidence that the SOS bias has an impact on the performance of hate speech detection.

Topik & Kata Kunci

cs.CL

Penulis (1)

Fatma Elsafoury

Format Sitasi

APA MLA BibTeX

Elsafoury, F. (2023). Systematic Offensive Stereotyping (SOS) Bias in Language Models. https://arxiv.org/abs/2308.10684

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2023
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓