Semantic Scholar Open Access 2023 1965 sitasi

MMMU: A Massive Multi-Discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Xiang Yue Yuansheng Ni Kai Zhang Tianyu Zheng Ruoqi Liu +17 lainnya

Lihat Sumber DOI

Abstrak

We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning. MMMU includes 11.5K meticulously collected multimodal questions from college exams, quizzes, and text-books, covering six core disciplines: Art & Design, Busi-ness, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering. These questions span 30 subjects and 183 subfields, comprising 30 highly het-erogeneous image types, such as charts, diagrams, maps, tables, music sheets, and chemical structures. Unlike existing benchmarks, MMMU focuses on advanced perception and reasoning with domain-specific knowledge, challenging models to perform tasks akin to those faced by experts. The evaluation of 28 open-source LMMs as well as the propri-etary GPT-4V(ision) and Gemini highlights the substantial challenges posed by MMMU. Even the advanced GPT-4V and Gemini Ultra only achieve accuracies of 56% and 59% respectively, indicating significant room for improvement. We believe MMMU will stimulate the community to build next-generation multimodal foundation models towards expert artificial general intelligence.

Topik & Kata Kunci

Computer Science

Penulis (22)

Xiang Yue

Yuansheng Ni

Kai Zhang

Tianyu Zheng

Ruoqi Liu

Ge Zhang

Samuel Stevens

Dongfu Jiang

Weiming Ren

Yuxuan Sun

Cong Wei

Botao Yu

Ruibin Yuan

Renliang Sun

Ming Yin

Boyuan Zheng

Zhenzhu Yang

Yibo Liu

Wenhao Huang

Huan Sun

Yu Su

Wenhu Chen

Format Sitasi

APA MLA BibTeX

Yue, X., Ni, Y., Zhang, K., Zheng, T., Liu, R., Zhang, G. et al. (2023). MMMU: A Massive Multi-Discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI. https://doi.org/10.1109/CVPR52733.2024.00913

Akses Cepat

Lihat di Sumber doi.org/10.1109/CVPR52733.2024.00913

Informasi Jurnal

Tahun Terbit: 2023
Bahasa: en
Total Sitasi: 1965×
Sumber Database: Semantic Scholar
DOI: 10.1109/CVPR52733.2024.00913
Akses: Open Access ✓