arXiv Open Access 2026

Where-to-Learn: Analytical Policy Gradient Directed Exploration for On-Policy Robotic Reinforcement Learning

Leixin Chang Xinchen Yao Ben Liu Liangjing Yang Hua Chen

Lihat Sumber

Abstrak

On-policy reinforcement learning (RL) algorithms have demonstrated great potential in robotic control, where effective exploration is crucial for efficient and high-quality policy learning. However, how to encourage the agent to explore the better trajectories efficiently remains a challenge. Most existing methods incentivize exploration by maximizing the policy entropy or encouraging novel state visiting regardless of the potential state value. We propose a new form of directed exploration that uses analytical policy gradients from a differentiable dynamics model to inject task-aware, physics-guided guidance, thereby steering the agent towards high-reward regions for accelerated and more effective policy learning.

Topik & Kata Kunci

cs.RO

Penulis (5)

Leixin Chang

Xinchen Yao

Ben Liu

Liangjing Yang

Hua Chen

Format Sitasi

APA MLA BibTeX

Chang, L., Yao, X., Liu, B., Yang, L., Chen, H. (2026). Where-to-Learn: Analytical Policy Gradient Directed Exploration for On-Policy Robotic Reinforcement Learning. https://arxiv.org/abs/2603.27317

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2026
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓