Semantic Scholar Open Access 2024 224 sitasi

We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

Runqi Qiao Qiuna Tan Guanting Dong Minhui Wu Chong Sun +13 lainnya

Abstrak

Visual mathematical reasoning, as a fundamental visual reasoning ability, has received widespread attention from the Large Multimodal Models (LMMs) community. Existing benchmarks, such as MathVista and MathVerse, focus more on the result-oriented performance but neglect the underlying principles in knowledge acquisition and generalization. Inspired by human-like mathematical reasoning, we introduce WE-MATH, the first benchmark specifically designed to explore the problem-solving principles beyond end-to-end performance. We meticulously collect and categorize 6.5K visual math problems, spanning 67 hierarchical knowledge concepts and five layers of knowledge granularity. We decompose composite problems into sub-problems according to the required knowledge concepts and introduce a novel four-dimensional metric, namely Insufficient Knowledge (IK), Inadequate Generalization (IG), Complete Mastery (CM), and Rote Memorization (RM), to hierarchically assess inherent issues in LMMs' reasoning process. With WE-MATH, we conduct a thorough evaluation of existing LMMs in visual mathematical reasoning and reveal a negative correlation between solving steps and problem-specific performance. We confirm the IK issue of LMMs can be effectively improved via knowledge augmentation strategies. More notably, the primary challenge of GPT-4o has significantly transitioned from IK to IG, establishing it as the first LMM advancing towards the knowledge generalization stage. In contrast, other LMMs exhibit a marked inclination towards Rote Memorization - they correctly solve composite problems involving multiple knowledge concepts yet fail to answer sub-problems. We anticipate that WE-MATH will open new pathways for advancements in visual mathematical reasoning for LMMs. The WE-MATH data and evaluation code are available at https://github.com/We-Math/We-Math.

Topik & Kata Kunci

Penulis (18)

R

Runqi Qiao

Q

Qiuna Tan

G

Guanting Dong

M

Minhui Wu

C

Chong Sun

X

Xiaoshuai Song

Z

Zhuoma Gongque

S

Shanglin Lei

Z

Zhe Wei

M

Miaoxuan Zhang

R

Runfeng Qiao

Y

Yifan Zhang

X

Xiao Zong

Y

Yida Xu

M

Muxi Diao

Z

Zhimin Bao

C

Chen Li

H

Honggang Zhang

Format Sitasi

Qiao, R., Tan, Q., Dong, G., Wu, M., Sun, C., Song, X. et al. (2024). We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?. https://doi.org/10.48550/arXiv.2407.01284

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.48550/arXiv.2407.01284
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Total Sitasi
224×
Sumber Database
Semantic Scholar
DOI
10.48550/arXiv.2407.01284
Akses
Open Access ✓