arXiv Open Access 2026

Dialect and Gender Bias in YouTube's Spanish Captioning System

Iris Dania Jimenez Christoph Kern
Lihat Sumber

Abstrak

Spanish is the official language of twenty-one countries and is spoken by over 441 million people. Naturally, there are many variations in how Spanish is spoken across these countries. Media platforms such as YouTube rely on automatic speech recognition systems to make their content accessible to different groups of users. However, YouTube offers only one option for automatically generating captions in Spanish. This raises the question: could this captioning system be biased against certain Spanish dialects? This study examines the potential biases in YouTube's automatic captioning system by analyzing its performance across various Spanish dialects. By comparing the quality of captions for female and male speakers from different regions, we identify systematic disparities which can be attributed to specific dialects. Our study provides further evidence that algorithmic technologies deployed on digital platforms need to be calibrated to the diverse needs and experiences of their user populations.

Topik & Kata Kunci

Penulis (2)

I

Iris Dania Jimenez

C

Christoph Kern

Format Sitasi

Jimenez, I.D., Kern, C. (2026). Dialect and Gender Bias in YouTube's Spanish Captioning System. https://arxiv.org/abs/2602.24002

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2026
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓