arXiv Open Access 2025

Culture In a Frame: C$^3$B as a Comic-Based Benchmark for Multimodal Culturally Awareness

Yuchen Song Andong Chen Wenxin Zhu Kehai Chen Xuefeng Bai +2 lainnya
Lihat Sumber

Abstrak

Cultural awareness capabilities have emerged as a critical capability for Multimodal Large Language Models (MLLMs). However, current benchmarks lack progressed difficulty in their task design and are deficient in cross-lingual tasks. Moreover, current benchmarks often use real-world images. Each real-world image typically contains one culture, making these benchmarks relatively easy for MLLMs. Based on this, we propose C$^3$B (Comics Cross-Cultural Benchmark), a novel multicultural, multitask and multilingual cultural awareness capabilities benchmark. C$^3$B comprises over 2000 images and over 18000 QA pairs, constructed on three tasks with progressed difficulties, from basic visual recognition to higher-level cultural conflict understanding, and finally to cultural content generation. We conducted evaluations on 11 open-source MLLMs, revealing a significant performance gap between MLLMs and human performance. The gap demonstrates that C$^3$B poses substantial challenges for current MLLMs, encouraging future research to advance the cultural awareness capabilities of MLLMs.

Topik & Kata Kunci

Penulis (7)

Y

Yuchen Song

A

Andong Chen

W

Wenxin Zhu

K

Kehai Chen

X

Xuefeng Bai

M

Muyun Yang

T

Tiejun Zhao

Format Sitasi

Song, Y., Chen, A., Zhu, W., Chen, K., Bai, X., Yang, M. et al. (2025). Culture In a Frame: C$^3$B as a Comic-Based Benchmark for Multimodal Culturally Awareness. https://arxiv.org/abs/2510.00041

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓