arXiv Open Access 2025

ViLLa: A Neuro-Symbolic approach for Animal Monitoring

Harsha Koduri

Lihat Sumber

Abstrak

Monitoring animal populations in natural environments requires systems that can interpret both visual data and human language queries. This work introduces ViLLa (Vision-Language-Logic Approach), a neuro-symbolic framework designed for interpretable animal monitoring. ViLLa integrates three core components: a visual detection module for identifying animals and their spatial locations in images, a language parser for understanding natural language queries, and a symbolic reasoning layer that applies logic-based inference to answer those queries. Given an image and a question such as "How many dogs are in the scene?" or "Where is the buffalo?", the system grounds visual detections into symbolic facts and uses predefined rules to compute accurate answers related to count, presence, and location. Unlike end-to-end black-box models, ViLLa separates perception, understanding, and reasoning, offering modularity and transparency. The system was evaluated on a range of animal imagery tasks and demonstrates the ability to bridge visual content with structured, human-interpretable queries.

Topik & Kata Kunci

cs.CV cs.AI

Penulis (1)

Harsha Koduri

Format Sitasi

APA MLA BibTeX

Koduri, H. (2025). ViLLa: A Neuro-Symbolic approach for Animal Monitoring. https://arxiv.org/abs/2506.14823

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓