arXiv Open Access 2024

With a Grain of SALT: Are LLMs Fair Across Social Dimensions?

Samee Arif Zohaib Khan Maaidah Kaleem Suhaib Rashid Agha Ali Raza +1 lainnya
Lihat Sumber

Abstrak

This paper presents a systematic analysis of biases in open-source Large Language Models (LLMs), across gender, religion, and race. Our study evaluates bias in smaller-scale Llama and Gemma models using the SALT ($\textbf{S}$ocial $\textbf{A}$ppropriateness in $\textbf{L}$LM-Generated $\textbf{T}$ext) dataset, which incorporates five distinct bias triggers: General Debate, Positioned Debate, Career Advice, Problem Solving, and CV Generation. To quantify bias, we measure win rates in General Debate and the assignment of negative roles in Positioned Debate. For real-world use cases, such as Career Advice, Problem Solving, and CV Generation, we anonymize the outputs to remove explicit demographic identifiers and use DeepSeek-R1 as an automated evaluator. We also address inherent biases in LLM-based evaluation, including evaluation bias, positional bias, and length bias, and validate our results through human evaluations. Our findings reveal consistent polarization across models, with certain demographic groups receiving systematically favorable or unfavorable treatment. By introducing SALT, we provide a comprehensive benchmark for bias analysis and underscore the need for robust bias mitigation strategies in the development of equitable AI systems.

Topik & Kata Kunci

Penulis (6)

S

Samee Arif

Z

Zohaib Khan

M

Maaidah Kaleem

S

Suhaib Rashid

A

Agha Ali Raza

A

Awais Athar

Format Sitasi

Arif, S., Khan, Z., Kaleem, M., Rashid, S., Raza, A.A., Athar, A. (2024). With a Grain of SALT: Are LLMs Fair Across Social Dimensions?. https://arxiv.org/abs/2410.12499

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓