arXiv Open Access 2025

Evaluation of Cultural Competence of Vision-Language Models

Srishti Yadav Lauren Tilton Maria Antoniak Taylor Arnold Jiaang Li +8 lainnya

Lihat Sumber

Abstrak

Modern vision-language models (VLMs) often fail at cultural competency evaluations and benchmarks. Given the diversity of applications built upon VLMs, there is renewed interest in understanding how they encode cultural nuances. While individual aspects of this problem have been studied, we still lack a comprehensive framework for systematically identifying and annotating the nuanced cultural dimensions present in images for VLMs. This position paper argues that foundational methodologies from visual culture studies (cultural studies, semiotics, and visual studies) are necessary for cultural analysis of images. Building upon this review, we propose a set of five frameworks, corresponding to cultural dimensions, that must be considered for a more complete analysis of the cultural competencies of VLMs.

Topik & Kata Kunci

cs.CV cs.CL

Penulis (13)

Srishti Yadav

Lauren Tilton

Maria Antoniak

Taylor Arnold

Jiaang Li

Siddhesh Milind Pawar

Antonia Karamolegkou

Stella Frank

Zhaochong An

Negar Rostamzadeh

Daniel Hershcovich

Serge Belongie

Ekaterina Shutova

Format Sitasi

APA MLA BibTeX

Yadav, S., Tilton, L., Antoniak, M., Arnold, T., Li, J., Pawar, S.M. et al. (2025). Evaluation of Cultural Competence of Vision-Language Models. https://arxiv.org/abs/2505.22793

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓