arXiv Open Access 2025

Culture In a Frame: C$^3$B as a Comic-Based Benchmark for Multimodal Culturally Awareness

Yuchen Song Andong Chen Wenxin Zhu Kehai Chen Xuefeng Bai +2 lainnya

Lihat Sumber

Abstrak

Cultural awareness capabilities have emerged as a critical capability for Multimodal Large Language Models (MLLMs). However, current benchmarks lack progressed difficulty in their task design and are deficient in cross-lingual tasks. Moreover, current benchmarks often use real-world images. Each real-world image typically contains one culture, making these benchmarks relatively easy for MLLMs. Based on this, we propose C$^3$B (Comics Cross-Cultural Benchmark), a novel multicultural, multitask and multilingual cultural awareness capabilities benchmark. C$^3$B comprises over 2000 images and over 18000 QA pairs, constructed on three tasks with progressed difficulties, from basic visual recognition to higher-level cultural conflict understanding, and finally to cultural content generation. We conducted evaluations on 11 open-source MLLMs, revealing a significant performance gap between MLLMs and human performance. The gap demonstrates that C$^3$B poses substantial challenges for current MLLMs, encouraging future research to advance the cultural awareness capabilities of MLLMs.

Topik & Kata Kunci

cs.CV cs.AI

Penulis (7)

Yuchen Song

Andong Chen

Wenxin Zhu

Kehai Chen

Xuefeng Bai

Muyun Yang

Tiejun Zhao

Format Sitasi

APA MLA BibTeX

Song, Y., Chen, A., Zhu, W., Chen, K., Bai, X., Yang, M. et al. (2025). Culture In a Frame: C$^3$B as a Comic-Based Benchmark for Multimodal Culturally Awareness. https://arxiv.org/abs/2510.00041

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓