arXiv Open Access 2025

EnDive: A Cross-Dialect Benchmark for Fairness and Performance in Large Language Models

Abhay Gupta Jacob Cheung Philip Meng Shayan Sayyed Austen Liao +2 lainnya
Lihat Sumber

Abstrak

The diversity of human language, shaped by social, cultural, and regional influences, presents significant challenges for natural language processing (NLP) systems. Existing benchmarks often overlook intra-language variations, leaving speakers of non-standard dialects underserved. To address this gap, we introduce EnDive (English Diversity), a benchmark that evaluates five widely-used large language models (LLMs) across tasks in language understanding, algorithmic reasoning, mathematics, and logic. Our framework translates Standard American English datasets into five underrepresented dialects using few-shot prompting with verified examples from native speakers, and compare these translations against rule-based methods via fluency assessments, preference tests, and semantic similarity metrics. Human evaluations confirm high translation quality, with average scores of at least 6.02/7 for faithfulness, fluency, and formality. By filtering out near-identical translations, we create a challenging dataset that reveals significant performance disparities - models consistently underperform on dialectal inputs compared to Standard American English. EnDive thus advances dialect-aware NLP by uncovering model biases and promoting more equitable language technologies.

Topik & Kata Kunci

Penulis (7)

A

Abhay Gupta

J

Jacob Cheung

P

Philip Meng

S

Shayan Sayyed

A

Austen Liao

K

Kevin Zhu

S

Sean O'Brien

Format Sitasi

Gupta, A., Cheung, J., Meng, P., Sayyed, S., Liao, A., Zhu, K. et al. (2025). EnDive: A Cross-Dialect Benchmark for Fairness and Performance in Large Language Models. https://arxiv.org/abs/2504.07100

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓