arXiv Open Access 2026

AgroNVILA: Perception-Reasoning Decoupling for Multi-view Agricultural Multimodal Large Language Models

Jiarui Zhang Junqi Hu Zurong Mai Yuhang Chen Shuohong Lou +6 lainnya
Lihat Sumber

Abstrak

Agricultural multimodal reasoning requires robust spatial understanding across varying scales, from ground-level close-ups to top-down UAV and satellite imagery. Existing Multi-modal Large Language Models (MLLMs) suffer from a significant "terrestrial-centric" bias, causing scale confusion and logic drift during complex agricultural planning. To address this, we introduce the first large-scale AgroOmni (288K), a multi-view training corpus designed to capture diverse spatial topologies and scales in modern precision agriculture. Built on this dataset, we propose AgroNVILA, an MLLM that utilizes a novel Perception-Reasoning Decoupling (PRD) architecture. On the perception side, we incorporate a View-Conditioned Meta-Net (VCMN), which injects macroscopic spatial context into visual tokens, resolving scale ambiguities with minimal computational overhead. On the reasoning side, Agriculture-aware Relative Policy Optimization (ARPO) leverages reinforcement learning to align the model's decision-making with expert agricultural logic, preventing statistical shortcuts. Extensive experiments demonstrate that AgroNVILA outperforms state-of-the-art MLLMs, achieving significant improvements (+15.18%) in multi-altitude agricultural reasoning, reflecting its robust capability for holistic agricultural spatial planning.

Topik & Kata Kunci

Penulis (11)

J

Jiarui Zhang

J

Junqi Hu

Z

Zurong Mai

Y

Yuhang Chen

S

Shuohong Lou

H

Henglian Huang

L

Lingyuan Zhao

J

Jianxi Huang

Y

Yutong Lu

H

Haohuan Fu

J

Juepeng Zheng

Format Sitasi

Zhang, J., Hu, J., Mai, Z., Chen, Y., Lou, S., Huang, H. et al. (2026). AgroNVILA: Perception-Reasoning Decoupling for Multi-view Agricultural Multimodal Large Language Models. https://arxiv.org/abs/2603.14342

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2026
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓