A deep reinforcement learning approach for emotion recognition from unaligned multimodal inputs
Abstrak
Multimodal emotion recognition is essential in affective computing, as it enables a more accurate and comprehensive understanding of human emotions by integrating diverse data modalities. However, current approaches still face key challenges, including the difficulty of handling unaligned multimodal inputs, limited ability to model long-term dependencies, and insufficient attention to relationships among emotional labels. To address these issues, this paper introduces a unified framework that combines a Pseudo-Alignment Algorithm (PAA) for processing unaligned data, a Multimodal Data Interaction Process (MDIP) for fusing text, audio, and video while preserving long-term contextual information, and a Deep Reinforcement Learning-based Emotion Detection (DRLED) model for exploring inter-emotional dependencies. Experiments conducted on the IEMOCAP benchmark dataset demonstrate that the proposed approach achieves strong emotion recognition performance without relying on pre-aligned multimodal data, highlighting its effectiveness and robustness in real-world scenarios.
Topik & Kata Kunci
Penulis (2)
Jamal El Hamdaoui
El Habib Nfaoui
Akses Cepat
- Tahun Terbit
- 2026
- Sumber Database
- DOAJ
- DOI
- 10.1016/j.mlwa.2026.100873
- Akses
- Open Access ✓