arXiv Open Access 2025

Designing Psychometric Bias Measures for ChatBots: An Application to Racial Bias Measurement

Mouhacine Benosman
Lihat Sumber

Abstrak

Artificial intelligence (AI), particularly in the form of large language models (LLMs) or chatbots, has become increasingly integrated into our daily lives. In the past five years, several LLMs have been introduced, including ChatGPT by OpenAI, Claude by Anthropic, and Llama by Meta, among others. These models have the potential to be employed across a wide range of human-machine interaction applications, such as chatbots for information retrieval, assistance in corporate hiring decisions, college admissions, financial loan approvals, parole determinations, and even in medical fields like psychotherapy delivered through chatbots. The key question is whether these chatbots will interact with humans in a bias-free manner or if they will further reinforce the existing pathological biases present in human-to-human interactions. If the latter is true, then how can we rigorously measure these biases? We address this challenge by introducing STAMP-LLM (Standardized Test and Assessment Measurement Protocol for LLMs), a psychometric-based principled two-phase framework for designing psychometric measures to evaluate chatbot biases: (i) a Definitional phase for construct mapping, item development, and expert review; and (ii) a Data/Analysis phase for protocol control (prompts/decoding), automated sampling, pre-specified scoring, and basic reliability/validity checks. We illustrate STAMP-LLM on racial bias using one explicit and two implicit measures.

Topik & Kata Kunci

Penulis (1)

M

Mouhacine Benosman

Format Sitasi

Benosman, M. (2025). Designing Psychometric Bias Measures for ChatBots: An Application to Racial Bias Measurement. https://arxiv.org/abs/2509.13324

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓