arXiv Open Access 2026

CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation

Chaeyun Kim YongTaek Lim Kihyun Kim Junghwan Kim Minwoo Kim

Lihat Sumber

Abstrak

Existing red-teaming benchmarks, when adapted to new languages via direct translation, fail to capture socio-technical vulnerabilities rooted in local culture and law, creating a critical blind spot in LLM safety evaluation. To address this gap, we introduce CAGE (Culturally Adaptive Generation), a framework that systematically adapts the adversarial intent of proven red-teaming prompts to new cultural contexts. At the core of CAGE is the Semantic Mold, a novel approach that disentangles a prompt's adversarial structure from its cultural content. This approach enables the modeling of realistic, localized threats rather than testing for simple jailbreaks. As a representative example, we demonstrate our framework by creating KoRSET, a Korean benchmark, which proves more effective at revealing vulnerabilities than direct translation baselines. CAGE offers a scalable solution for developing meaningful, context-aware safety benchmarks across diverse cultures. Our dataset and evaluation rubrics are publicly available at https://github.com/selectstar-ai/CAGE-paper. (WARNING: This paper contains model outputs that can be offensive in nature.)

Topik & Kata Kunci

cs.CY cs.AI

Penulis (5)

Chaeyun Kim

YongTaek Lim

Kihyun Kim

Junghwan Kim

Minwoo Kim

Format Sitasi

APA MLA BibTeX

Kim, C., Lim, Y., Kim, K., Kim, J., Kim, M. (2026). CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation. https://arxiv.org/abs/2602.20170

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2026
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓