Vision transformer embedded video anomaly detection using attention driven recurrence
Abstrak
Automated video anomaly detection (VAD) is a challenging task due to its context-dependent and sporadic nature. However, recent deep learning advancements offer promising solutions. In this paper, we propose a novel framework for detecting anomalies in videos by uniquely analyzing spatial and temporal (spatio-temporal) features. We address challenges such as the processing of lengthy videos and the sparse occurrence of anomalies by segmenting and labeling anomalous parts within videos. We employ a modified pre-trained vision transformer for video feature extraction, leveraging its ability to capture complex spatio-temporal patterns and the global context. Additionally, we incorporate a parameter-efficient recurrent model, the Simple Recurrent Unit Plus Plus (SRU++), which processes long sequential video embeddings efficiently by reducing computational costs by ten times compared to traditional methods. To further enhance the multiclass prediction performance, we develop a cluster-based weighting mechanism that assigns weights to classification scores based on feature similarity. We extensively evaluated our approach on three popular datasets — UCF-Crime, RWF-2000, and Smart City CCTV Violence Detection (SCVD) — achieving superior performance compared to state-of-the-art methods, making it well-suited for real-world surveillance applications.
Topik & Kata Kunci
Penulis (6)
Ummay Maria Muna
Shanta Biswas
Syed Abu Ammar Muhammad Zarif
Philip Jefferson Deori
Tauseef Tajwar
Swakkhar Shatabda
Akses Cepat
PDF tidak tersedia langsung
Cek di sumber asli →- Tahun Terbit
- 2025
- Sumber Database
- DOAJ
- DOI
- 10.1016/j.array.2025.100471
- Akses
- Open Access ✓