Semantic Scholar Open Access 2023 33 sitasi

Prismer: A Vision-Language Model with Multi-Task Experts

Shikun Liu Linxi (Jim) Fan Edward Johns Zhiding Yu Chaowei Xiao +1 lainnya

Lihat Sumber

Abstrak

Recent vision-language models have shown impressive multi-modal generation capabilities. However, typically they require training huge models on massive datasets. As a more scalable alternative, we introduce Prismer, a data- and parameter-efficient vision-language model that leverages an ensemble of task-specific experts. Prismer only requires training of a small number of components, with the majority of network weights inherited from multiple readily-available, pre-trained experts, and kept frozen during training. By leveraging experts from a wide range of domains, we show Prismer can efficiently pool this expert knowledge and adapt it to various vision-language reasoning tasks. In our experiments, we show that Prismer achieves fine-tuned and few-shot learning performance which is competitive with current state-of-the-arts, whilst requiring up to two orders of magnitude less training data. Code is available at https://github.com/NVlabs/prismer.

Topik & Kata Kunci

Computer Science

Penulis (6)

Shikun Liu

Linxi (Jim) Fan

Edward Johns

Zhiding Yu

Chaowei Xiao

Anima Anandkumar

Format Sitasi

APA MLA BibTeX

Liu, S., Fan, L.(., Johns, E., Yu, Z., Xiao, C., Anandkumar, A. (2023). Prismer: A Vision-Language Model with Multi-Task Experts. https://www.semanticscholar.org/paper/f02d56e630986997e0aea3d92bf53e0f363ce401

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2023
Bahasa: en
Total Sitasi: 33×
Sumber Database: Semantic Scholar
Akses: Open Access ✓