EO-MADDPG: An Improved Reinforcement Learning Approach for Multi-UAV Pursuit–Evasion Games
Abstrak
To advance research in multi-agent reinforcement learning (MARL) for pursuit–evasion scenarios, this paper introduces a novel algorithm called Expert Knowledge and Opponent Modeling Multi-UAV Deep Deterministic Policy Gradient (EO-MADDPG). EO-MADDPG consists of two key components: the integration of expert knowledge and real-time sampled data and the prediction of evader UAV actions. The expert knowledge includes a multi-UAV formation control algorithm and an encirclement strategy, which incorporates consensus algorithms and Apollonius circle guidance. Additionally, the network-training framework is optimized by integrating information about opponent actions under a fixed policy for improved prediction accuracy. The experiments focus on three vs. one and three vs. two scenarios, where pursuer UAVs utilize EO-MADDPG and evader UAVs follow fixed policies with Gaussian perturbations. Experimental results show that EO-MADDPG achieves success rates of 99.9 ± 0.3% and 97.5 ± 1.4% (mean ± std over five seeds) in three vs. one and three vs. two pursuit–evasion simulations, respectively, outperforming the baseline MADDPG (72.7 ± 6.0% and 64.4 ± 34.4%). Ablation studies and cooperative landmark tasks further demonstrate improved training stability and interpretability.
Topik & Kata Kunci
Penulis (6)
Xiao Wang
Mengyu Wang
Xueqian Bai
Zhe Ma
Kewu Sun
Jiake Li
Akses Cepat
- Tahun Terbit
- 2026
- Sumber Database
- DOAJ
- DOI
- 10.3390/aerospace13030296
- Akses
- Open Access ✓