arXiv Open Access 2025

Occlusion Robustness of CLIP for Military Vehicle Classification

Jan Erik van Woerden Gertjan Burghouts Lotte Nijskens Alma M. Liezenga Sabina van Rooij +2 lainnya
Lihat Sumber

Abstrak

Vision-language models (VLMs) like CLIP enable zero-shot classification by aligning images and text in a shared embedding space, offering advantages for defense applications with scarce labeled data. However, CLIP's robustness in challenging military environments, with partial occlusion and degraded signal-to-noise ratio (SNR), remains underexplored. We investigate CLIP variants' robustness to occlusion using a custom dataset of 18 military vehicle classes and evaluate using Normalized Area Under the Curve (NAUC) across occlusion percentages. Four key insights emerge: (1) Transformer-based CLIP models consistently outperform CNNs, (2) fine-grained, dispersed occlusions degrade performance more than larger contiguous occlusions, (3) despite improved accuracy, performance of linear-probed models sharply drops at around 35% occlusion, (4) by finetuning the model's backbone, this performance drop occurs at more than 60% occlusion. These results underscore the importance of occlusion-specific augmentations during training and the need for further exploration into patch-level sensitivity and architectural resilience for real-world deployment of CLIP.

Topik & Kata Kunci

Penulis (7)

J

Jan Erik van Woerden

G

Gertjan Burghouts

L

Lotte Nijskens

A

Alma M. Liezenga

S

Sabina van Rooij

F

Frank Ruis

H

Hugo J. Kuijf

Format Sitasi

Woerden, J.E.v., Burghouts, G., Nijskens, L., Liezenga, A.M., Rooij, S.v., Ruis, F. et al. (2025). Occlusion Robustness of CLIP for Military Vehicle Classification. https://arxiv.org/abs/2508.20760

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓