Dual Preference Learning for Multi-Agent Reinforcement Learning
Abstrak
Designing effective reward functions is fundamental challenging in reinforcement learning, especially in complex multi-agent systems with intricate credit assignment. Preference-based reinforcement learning (PbRL) offers an alternative to manual reward engineering by learning from preferences. However, the prevalent approach in PbRL, which involves pairwise trajectory comparisons, encounters difficulties when applied to multi-agent systems due to exponentially large state-action spaces and temporal credit assignment problem. To address these challenges, we introduce Dual Preference-based Multi-Agent Reinforcement Learning (DPM), which uniquely employs dual preferences: trajectory-level and, crucially, transition-level agent contribution comparisons. This allows to distinguish cooperative and non-cooperative actions, enhancing reward learning efficiency. Furthermore, DPM leverages Large Language Models to generate dual preferences, significantly reducing reliance on costly human feedback and the potential for human error. Experiments on the StarCraft Multi-Agent Challenge, SMACv2, and Google Research Football show significant performance gains over baselines, demonstrating DPM effectiveness in optimizing rewards and improving performance in multi-agent systems.
Topik & Kata Kunci
Penulis (4)
Sehyeok Kang
Minu Kim
Jihwan Oh
Se-Young Yun
Akses Cepat
PDF tidak tersedia langsung
Cek di sumber asli →- Tahun Terbit
- 2026
- Sumber Database
- DOAJ
- DOI
- 10.1109/ACCESS.2025.3645778
- Akses
- Open Access ✓