arXiv Open Access 2021

Worst of Both Worlds: Biases Compound in Pre-trained Vision-and-Language Models

Tejas Srinivasan Yonatan Bisk

Lihat Sumber

Abstrak

Numerous works have analyzed biases in vision and pre-trained language models individually - however, less attention has been paid to how these biases interact in multimodal settings. This work extends text-based bias analysis methods to investigate multimodal language models, and analyzes intra- and inter-modality associations and biases learned by these models. Specifically, we demonstrate that VL-BERT (Su et al., 2020) exhibits gender biases, often preferring to reinforce a stereotype over faithfully describing the visual scene. We demonstrate these findings on a controlled case-study and extend them for a larger set of stereotypically gendered entities.

Topik & Kata Kunci

cs.CL

Penulis (2)

Tejas Srinivasan

Yonatan Bisk

Format Sitasi

APA MLA BibTeX

Srinivasan, T., Bisk, Y. (2021). Worst of Both Worlds: Biases Compound in Pre-trained Vision-and-Language Models. https://arxiv.org/abs/2104.08666

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2021
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓