arXiv Open Access 2024

ColorFoil: Investigating Color Blindness in Large Vision and Language Models

Ahnaf Mozib Samin M. Firoz Ahmed Md. Mushtaq Shahriyar Rafee

Lihat Sumber

Abstrak

With the utilization of Transformer architecture, large Vision and Language (V&L) models have shown promising performance in even zero-shot settings. Several studies, however, indicate a lack of robustness of the models when dealing with complex linguistics and visual attributes. In this work, we introduce a novel V&L benchmark - ColorFoil, by creating color-related foils to assess the models' perception ability to detect colors like red, white, green, etc. We evaluate seven state-of-the-art V&L models including CLIP, ViLT, GroupViT, and BridgeTower, etc. in a zero-shot setting and present intriguing findings from the V&L models. The experimental evaluation indicates that ViLT and BridgeTower demonstrate much better color perception capabilities compared to CLIP and its variants and GroupViT. Moreover, CLIP-based models and GroupViT struggle to distinguish colors that are visually distinct to humans with normal color perception ability.

Topik & Kata Kunci

cs.CV cs.CL

Penulis (3)

Ahnaf Mozib Samin

M. Firoz Ahmed

Md. Mushtaq Shahriyar Rafee

Format Sitasi

APA MLA BibTeX

Samin, A.M., Ahmed, M.F., Rafee, M.M.S. (2024). ColorFoil: Investigating Color Blindness in Large Vision and Language Models. https://arxiv.org/abs/2405.11685

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓