arXiv
Open Access
2017
Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines
Philip S. Thomas
Emma Brunskill
Abstrak
We show how an action-dependent baseline can be used by the policy gradient theorem using function approximation, originally presented with action-independent baselines by (Sutton et al. 2000).
Penulis (2)
P
Philip S. Thomas
E
Emma Brunskill
Akses Cepat
Informasi Jurnal
- Tahun Terbit
- 2017
- Bahasa
- en
- Sumber Database
- arXiv
- Akses
- Open Access ✓