Empirical Evaluation of Invariances in Deep Vision Models
Abstrak
The ability of deep learning models to maintain consistent performance under image transformations-termed invariances, is critical for reliable deployment across diverse computer vision applications. This study presents a comprehensive empirical evaluation of modern convolutional neural networks (CNNs) and vision transformers (ViTs) concerning four fundamental types of image invariances: blur, noise, rotation, and scale. We analyze a curated selection of thirty models across three common vision tasks, object localization, recognition, and semantic segmentation, using benchmark datasets including COCO, ImageNet, and a custom segmentation dataset. Our experimental protocol introduces controlled perturbations to test model robustness and employs task-specific metrics such as mean Intersection over Union (mIoU), and classification accuracy (Acc) to quantify models’ performance degradation. Results indicate that while ViTs generally outperform CNNs under blur and noise corruption in recognition tasks, both model families exhibit significant vulnerabilities to rotation and extreme scale transformations. Notably, segmentation models demonstrate higher resilience to geometric variations, with SegFormer and Mask2Former emerging as the most robust architectures. These findings challenge prevailing assumptions regarding model robustness and provide actionable insights for designing vision systems capable of withstanding real-world input variability.
Topik & Kata Kunci
Penulis (3)
Konstantinos Keremis
Eleni Vrochidou
George A. Papakostas
Akses Cepat
- Tahun Terbit
- 2025
- Sumber Database
- DOAJ
- DOI
- 10.3390/jimaging11090322
- Akses
- Open Access ✓