arXiv Open Access 2025

Rice-VL: Evaluating Vision-Language Models for Cultural Understanding Across ASEAN Countries

Tushar Pranav Eshan Pandey Austria Lyka Diane Bala Aman Chadha Indriyati Atmosukarto +1 lainnya

Lihat Sumber

Abstrak

Vision-Language Models (VLMs) excel in multimodal tasks but often exhibit Western-centric biases, limiting their effectiveness in culturally diverse regions like Southeast Asia (SEA). To address this, we introduce RICE-VL, a novel benchmark evaluating VLM cultural understanding across 11 ASEAN countries. RICE-VL includes over 28,000 human-curated Visual Question Answering (VQA) samples -- covering True or False, Fill-in-the-Blank, and open-ended formats -- and 1,000 image-bounding box pairs for Visual Grounding, annotated by culturally informed experts across 14 sub-ground categories. We propose SEA-LAVE, an extension of the LAVE metric, assessing textual accuracy, cultural alignment, and country identification. Evaluations of six open- and closed-source VLMs reveal significant performance gaps in low-resource countries and abstract cultural domains. The Visual Grounding task tests models' ability to localize culturally significant elements in complex scenes, probing spatial and contextual accuracy. RICE-VL exposes limitations in VLMs' cultural comprehension and highlights the need for inclusive model development to better serve diverse global populations.

Topik & Kata Kunci

cs.CV cs.AI

Penulis (6)

Tushar Pranav

Eshan Pandey

Austria Lyka Diane Bala

Aman Chadha

Indriyati Atmosukarto

Donny Soh Cheng Lock

Format Sitasi

APA MLA BibTeX

Pranav, T., Pandey, E., Bala, A.L.D., Chadha, A., Atmosukarto, I., Lock, D.S.C. (2025). Rice-VL: Evaluating Vision-Language Models for Cultural Understanding Across ASEAN Countries. https://arxiv.org/abs/2512.01419

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓