DOAJ Open Access 2026

AG-CLIP: Attribute-Guided CLIP for Zero-Shot Fine-Grained Recognition

Jamil Ahmad Mustaqeem Khan Wail Guiaeab Abdulmotaleb Elsaddik Giulia De Masi +1 lainnya

Abstrak

Zero-shot fine-grained recognition is challenging due to high visual similarities between classes and the inferior encoding of fine-grained features in embedding models. In this work, we present an attribute-guided Contrastive Language-Image Pre-training (AG-CLIP) model with an additional attribute encoder. Our approach first identifies relevant visual attributes from the textual class descriptions using an attribute mining module leveraging a large language model (LLM) GPT-4o. The attributes are then used to construct prompts for an open vocabulary object/region detector to extract relevant corresponding image regions. The attribute text, along with focused regions of the input, then guides the CLIP model to focus on these discriminative attributes during fine-tuning through a context-attribute fusion module. Our attribute-guided attention mechanism allows CLIP to effectively disambiguate fine-grained classes by highlighting their distinctive attributes without requiring fine-tuning or additional training data on unseen classes. We evaluate our approach on the CUB-200-2011 and plant disease datasets, achieving 73.3% and 84.6% accuracy, respectively. Our method achieves state-of-the-art zero-shot performance, outperforming prior methods that rely on external knowledge bases or complex meta-learning strategies. The strong results demonstratethe effectiveness of injecting generic attribute awareness into powerful vision-language models like CLIP for tackling fine-grained recognition in a zero-shot manner.

Topik & Kata Kunci

Electronic computers. Computer science Information technology

Penulis (6)

Jamil Ahmad

Mustaqeem Khan

Wail Guiaeab

Abdulmotaleb Elsaddik

Giulia De Masi

Fakhri Karray

Format Sitasi

APA MLA BibTeX

Ahmad, J., Khan, M., Guiaeab, W., Elsaddik, A., Masi, G.D., Karray, F. (2026). AG-CLIP: Attribute-Guided CLIP for Zero-Shot Fine-Grained Recognition. https://doi.org/10.1109/OJCS.2026.3654171

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber doi.org/10.1109/OJCS.2026.3654171

Informasi Jurnal

Tahun Terbit: 2026
Sumber Database: DOAJ
DOI: 10.1109/OJCS.2026.3654171
Akses: Open Access ✓