arXiv Open Access 2025

AgMMU: A Comprehensive Agricultural Multimodal Understanding Benchmark

Aruna Gauba Irene Pi Yunze Man Ziqi Pang Vikram S. Adve +1 lainnya
Lihat Sumber

Abstrak

We present AgMMU, a challenging real-world benchmark for evaluating and advancing vision-language models (VLMs) in the knowledge-intensive domain of agriculture. Unlike prior datasets that rely on crowdsourced prompts, AgMMU is distilled from 116,231 authentic dialogues between everyday growers and USDA-authorized Cooperative Extension experts. Through a three-stage pipeline: automated knowledge extraction, QA generation, and human verification, we construct (i) AgMMU, an evaluation set of 746 multiple-choice questions (MCQs) and 746 open-ended questions (OEQs), and (ii) AgBase, a development corpus of 57,079 multimodal facts covering five high-stakes agricultural topics: insect identification, species identification, disease categorization, symptom description, and management instruction. Benchmarking 12 leading VLMs reveals pronounced gaps in fine-grained perception and factual grounding. Open-sourced models trail after proprietary ones by a wide margin. Simple fine-tuning on AgBase boosts open-sourced model performance on challenging OEQs for up to 11.6% on average, narrowing this gap and also motivating future research to propose better strategies in knowledge extraction and distillation from AgBase. We hope AgMMU stimulates research on domain-specific knowledge integration and trustworthy decision support in agriculture AI development.

Topik & Kata Kunci

Penulis (6)

A

Aruna Gauba

I

Irene Pi

Y

Yunze Man

Z

Ziqi Pang

V

Vikram S. Adve

Y

Yu-Xiong Wang

Format Sitasi

Gauba, A., Pi, I., Man, Y., Pang, Z., Adve, V.S., Wang, Y. (2025). AgMMU: A Comprehensive Agricultural Multimodal Understanding Benchmark. https://arxiv.org/abs/2504.10568

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓