arXiv Open Access 2025

Occlusion Robustness of CLIP for Military Vehicle Classification

Jan Erik van Woerden Gertjan Burghouts Lotte Nijskens Alma M. Liezenga Sabina van Rooij +2 lainnya

Lihat Sumber

Abstrak

Vision-language models (VLMs) like CLIP enable zero-shot classification by aligning images and text in a shared embedding space, offering advantages for defense applications with scarce labeled data. However, CLIP's robustness in challenging military environments, with partial occlusion and degraded signal-to-noise ratio (SNR), remains underexplored. We investigate CLIP variants' robustness to occlusion using a custom dataset of 18 military vehicle classes and evaluate using Normalized Area Under the Curve (NAUC) across occlusion percentages. Four key insights emerge: (1) Transformer-based CLIP models consistently outperform CNNs, (2) fine-grained, dispersed occlusions degrade performance more than larger contiguous occlusions, (3) despite improved accuracy, performance of linear-probed models sharply drops at around 35% occlusion, (4) by finetuning the model's backbone, this performance drop occurs at more than 60% occlusion. These results underscore the importance of occlusion-specific augmentations during training and the need for further exploration into patch-level sensitivity and architectural resilience for real-world deployment of CLIP.

Topik & Kata Kunci

cs.CV cs.AI

Penulis (7)

Jan Erik van Woerden

Gertjan Burghouts

Lotte Nijskens

Alma M. Liezenga

Sabina van Rooij

Frank Ruis

Hugo J. Kuijf

Format Sitasi

APA MLA BibTeX

Woerden, J.E.v., Burghouts, G., Nijskens, L., Liezenga, A.M., Rooij, S.v., Ruis, F. et al. (2025). Occlusion Robustness of CLIP for Military Vehicle Classification. https://arxiv.org/abs/2508.20760

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2025
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓