DOAJ Open Access 2026

Dual Preference Learning for Multi-Agent Reinforcement Learning

Sehyeok Kang Minu Kim Jihwan Oh Se-Young Yun

Abstrak

Designing effective reward functions is fundamental challenging in reinforcement learning, especially in complex multi-agent systems with intricate credit assignment. Preference-based reinforcement learning (PbRL) offers an alternative to manual reward engineering by learning from preferences. However, the prevalent approach in PbRL, which involves pairwise trajectory comparisons, encounters difficulties when applied to multi-agent systems due to exponentially large state-action spaces and temporal credit assignment problem. To address these challenges, we introduce Dual Preference-based Multi-Agent Reinforcement Learning (DPM), which uniquely employs dual preferences: trajectory-level and, crucially, transition-level agent contribution comparisons. This allows to distinguish cooperative and non-cooperative actions, enhancing reward learning efficiency. Furthermore, DPM leverages Large Language Models to generate dual preferences, significantly reducing reliance on costly human feedback and the potential for human error. Experiments on the StarCraft Multi-Agent Challenge, SMACv2, and Google Research Football show significant performance gains over baselines, demonstrating DPM effectiveness in optimizing rewards and improving performance in multi-agent systems.

Penulis (4)

S

Sehyeok Kang

M

Minu Kim

J

Jihwan Oh

S

Se-Young Yun

Format Sitasi

Kang, S., Kim, M., Oh, J., Yun, S. (2026). Dual Preference Learning for Multi-Agent Reinforcement Learning. https://doi.org/10.1109/ACCESS.2025.3645778

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.1109/ACCESS.2025.3645778
Informasi Jurnal
Tahun Terbit
2026
Sumber Database
DOAJ
DOI
10.1109/ACCESS.2025.3645778
Akses
Open Access ✓