arXiv Open Access 2026

Where-to-Learn: Analytical Policy Gradient Directed Exploration for On-Policy Robotic Reinforcement Learning

Leixin Chang Xinchen Yao Ben Liu Liangjing Yang Hua Chen
Lihat Sumber

Abstrak

On-policy reinforcement learning (RL) algorithms have demonstrated great potential in robotic control, where effective exploration is crucial for efficient and high-quality policy learning. However, how to encourage the agent to explore the better trajectories efficiently remains a challenge. Most existing methods incentivize exploration by maximizing the policy entropy or encouraging novel state visiting regardless of the potential state value. We propose a new form of directed exploration that uses analytical policy gradients from a differentiable dynamics model to inject task-aware, physics-guided guidance, thereby steering the agent towards high-reward regions for accelerated and more effective policy learning.

Topik & Kata Kunci

Penulis (5)

L

Leixin Chang

X

Xinchen Yao

B

Ben Liu

L

Liangjing Yang

H

Hua Chen

Format Sitasi

Chang, L., Yao, X., Liu, B., Yang, L., Chen, H. (2026). Where-to-Learn: Analytical Policy Gradient Directed Exploration for On-Policy Robotic Reinforcement Learning. https://arxiv.org/abs/2603.27317

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2026
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓