arXiv Open Access 2026

CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation

Chaeyun Kim YongTaek Lim Kihyun Kim Junghwan Kim Minwoo Kim
Lihat Sumber

Abstrak

Existing red-teaming benchmarks, when adapted to new languages via direct translation, fail to capture socio-technical vulnerabilities rooted in local culture and law, creating a critical blind spot in LLM safety evaluation. To address this gap, we introduce CAGE (Culturally Adaptive Generation), a framework that systematically adapts the adversarial intent of proven red-teaming prompts to new cultural contexts. At the core of CAGE is the Semantic Mold, a novel approach that disentangles a prompt's adversarial structure from its cultural content. This approach enables the modeling of realistic, localized threats rather than testing for simple jailbreaks. As a representative example, we demonstrate our framework by creating KoRSET, a Korean benchmark, which proves more effective at revealing vulnerabilities than direct translation baselines. CAGE offers a scalable solution for developing meaningful, context-aware safety benchmarks across diverse cultures. Our dataset and evaluation rubrics are publicly available at https://github.com/selectstar-ai/CAGE-paper. (WARNING: This paper contains model outputs that can be offensive in nature.)

Topik & Kata Kunci

Penulis (5)

C

Chaeyun Kim

Y

YongTaek Lim

K

Kihyun Kim

J

Junghwan Kim

M

Minwoo Kim

Format Sitasi

Kim, C., Lim, Y., Kim, K., Kim, J., Kim, M. (2026). CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation. https://arxiv.org/abs/2602.20170

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2026
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓