Semantic Scholar Open Access 2023 1965 sitasi

MMMU: A Massive Multi-Discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Xiang Yue Yuansheng Ni Kai Zhang Tianyu Zheng Ruoqi Liu +17 lainnya

Abstrak

We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning. MMMU includes 11.5K meticulously collected multimodal questions from college exams, quizzes, and text-books, covering six core disciplines: Art & Design, Busi-ness, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering. These questions span 30 subjects and 183 subfields, comprising 30 highly het-erogeneous image types, such as charts, diagrams, maps, tables, music sheets, and chemical structures. Unlike existing benchmarks, MMMU focuses on advanced perception and reasoning with domain-specific knowledge, challenging models to perform tasks akin to those faced by experts. The evaluation of 28 open-source LMMs as well as the propri-etary GPT-4V(ision) and Gemini highlights the substantial challenges posed by MMMU. Even the advanced GPT-4V and Gemini Ultra only achieve accuracies of 56% and 59% respectively, indicating significant room for improvement. We believe MMMU will stimulate the community to build next-generation multimodal foundation models towards expert artificial general intelligence.

Topik & Kata Kunci

Penulis (22)

X

Xiang Yue

Y

Yuansheng Ni

K

Kai Zhang

T

Tianyu Zheng

R

Ruoqi Liu

G

Ge Zhang

S

Samuel Stevens

D

Dongfu Jiang

W

Weiming Ren

Y

Yuxuan Sun

C

Cong Wei

B

Botao Yu

R

Ruibin Yuan

R

Renliang Sun

M

Ming Yin

B

Boyuan Zheng

Z

Zhenzhu Yang

Y

Yibo Liu

W

Wenhao Huang

H

Huan Sun

Y

Yu Su

W

Wenhu Chen

Format Sitasi

Yue, X., Ni, Y., Zhang, K., Zheng, T., Liu, R., Zhang, G. et al. (2023). MMMU: A Massive Multi-Discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI. https://doi.org/10.1109/CVPR52733.2024.00913

Akses Cepat

Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Total Sitasi
1965×
Sumber Database
Semantic Scholar
DOI
10.1109/CVPR52733.2024.00913
Akses
Open Access ✓