arXiv Open Access 2024

Leveraging Vision Language Models for Specialized Agricultural Tasks

Muhammad Arbab Arshad Talukder Zaki Jubery Tirtho Roy Rim Nassiri Asheesh K. Singh +6 lainnya
Lihat Sumber

Abstrak

As Vision Language Models (VLMs) become increasingly accessible to farmers and agricultural experts, there is a growing need to evaluate their potential in specialized tasks. We present AgEval, a comprehensive benchmark for assessing VLMs' capabilities in plant stress phenotyping, offering a solution to the challenge of limited annotated data in agriculture. Our study explores how general-purpose VLMs can be leveraged for domain-specific tasks with only a few annotated examples, providing insights into their behavior and adaptability. AgEval encompasses 12 diverse plant stress phenotyping tasks, evaluating zero-shot and few-shot in-context learning performance of state-of-the-art models including Claude, GPT, Gemini, and LLaVA. Our results demonstrate VLMs' rapid adaptability to specialized tasks, with the best-performing model showing an increase in F1 scores from 46.24% to 73.37% in 8-shot identification. To quantify performance disparities across classes, we introduce metrics such as the coefficient of variation (CV), revealing that VLMs' training impacts classes differently, with CV ranging from 26.02% to 58.03%. We also find that strategic example selection enhances model reliability, with exact category examples improving F1 scores by 15.38% on average. AgEval establishes a framework for assessing VLMs in agricultural applications, offering valuable benchmarks for future evaluations. Our findings suggest that VLMs, with minimal few-shot examples, show promise as a viable alternative to traditional specialized models in plant stress phenotyping, while also highlighting areas for further refinement. Results and benchmark details are available at: https://github.com/arbab-ml/AgEval

Topik & Kata Kunci

Penulis (11)

M

Muhammad Arbab Arshad

T

Talukder Zaki Jubery

T

Tirtho Roy

R

Rim Nassiri

A

Asheesh K. Singh

A

Arti Singh

C

Chinmay Hegde

B

Baskar Ganapathysubramanian

A

Aditya Balu

A

Adarsh Krishnamurthy

S

Soumik Sarkar

Format Sitasi

Arshad, M.A., Jubery, T.Z., Roy, T., Nassiri, R., Singh, A.K., Singh, A. et al. (2024). Leveraging Vision Language Models for Specialized Agricultural Tasks. https://arxiv.org/abs/2407.19617

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2024
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓