arXiv Open Access 2025

Culture Cartography: Mapping the Landscape of Cultural Knowledge

Caleb Ziems William Held Jane Yu Amir Goldberg David Grusky +1 lainnya
Lihat Sumber

Abstrak

To serve global users safely and productively, LLMs need culture-specific knowledge that might not be learned during pre-training. How do we find such knowledge that is (1) salient to in-group users, but (2) unknown to LLMs? The most common solutions are single-initiative: either researchers define challenging questions that users passively answer (traditional annotation), or users actively produce data that researchers structure as benchmarks (knowledge extraction). The process would benefit from mixed-initiative collaboration, where users guide the process to meaningfully reflect their cultures, and LLMs steer the process towards more challenging questions that meet the researcher's goals. We propose a mixed-initiative methodology called CultureCartography. Here, an LLM initializes annotation with questions for which it has low-confidence answers, making explicit both its prior knowledge and the gaps therein. This allows a human respondent to fill these gaps and steer the model towards salient topics through direct edits. We implement this methodology as a tool called CultureExplorer. Compared to a baseline where humans answer LLM-proposed questions, we find that CultureExplorer more effectively produces knowledge that leading models like DeepSeek R1 and GPT-4o are missing, even with web search. Fine-tuning on this data boosts the accuracy of Llama-3.1-8B by up to 19.2% on related culture benchmarks.

Topik & Kata Kunci

Penulis (6)

C

Caleb Ziems

W

William Held

J

Jane Yu

A

Amir Goldberg

D

David Grusky

D

Diyi Yang

Format Sitasi

Ziems, C., Held, W., Yu, J., Goldberg, A., Grusky, D., Yang, D. (2025). Culture Cartography: Mapping the Landscape of Cultural Knowledge. https://arxiv.org/abs/2510.27672

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓