DOAJ Open Access 2025

Unified Multi-Modal Object Tracking Through Spatial–Temporal Propagation and Modality Synergy

Jiajia Wu Haorui Zuo Yuxing Wei Meihui Li Jianlin Zhang

Abstrak

Multi-modal object tracking (MMOT) has received widespread attention for the ability to overcome single-sensor perception limitations. However, existing methods encounter several critical challenges. Representation learning and generalization capabilities of models are constrained by the inherent heterogeneity of cross-task multi-modal data and inter-modal synergy imbalance. Particularly, in dynamically changing complex scenarios, the reliability and stability of data significantly degrade, further exacerbating the difficulty in multi-modal consistent perception and aggregation. To tackle the above issues, we propose SMUTrack, a unified framework with global shared parameters integrating three downstream MMOT tasks. SMUTrack implements a batch merging-and-splitting alternating strategy, coupled with multi-task joint training, to establish latent correlations across inter- and intra-task modalities, effectively avoiding over-reliance on certain modalities. Concurrently, we design a hierarchical modality synergy and reinforcement (HMSR) module, and a gated fusion and context awareness (GFCA) module to enable progressive multi-modal information exchange and integration, yielding the more discriminative and robust multi-modal representation. More importantly, we introduce a spatial–temporal information propagation (SIP) mechanism, which synchronously learns object trajectory cues and appearance variations to effectively build contextual relationships in long-term video tracking. Experimental results definitively validate the outstanding performance of SMUTrack on mainstream MMOT datasets, exhibiting its powerful adaptability to various MMOT tasks.

Topik & Kata Kunci

Photography Computer applications to medicine. Medical informatics Electronic computers. Computer science

Penulis (5)

Jiajia Wu

Haorui Zuo

Yuxing Wei

Meihui Li

Jianlin Zhang

Format Sitasi

APA MLA BibTeX

Wu, J., Zuo, H., Wei, Y., Li, M., Zhang, J. (2025). Unified Multi-Modal Object Tracking Through Spatial–Temporal Propagation and Modality Synergy. https://doi.org/10.3390/jimaging11120421

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber doi.org/10.3390/jimaging11120421

Informasi Jurnal

Tahun Terbit: 2025
Sumber Database: DOAJ
DOI: 10.3390/jimaging11120421
Akses: Open Access ✓