arXiv
Open Access
2015
Emphatic TD Bellman Operator is a Contraction
Assaf Hallak
Aviv Tamar
Shie Mannor
Abstrak
Recently, \citet{SuttonMW15} introduced the emphatic temporal differences (ETD) algorithm for off-policy evaluation in Markov decision processes. In this short note, we show that the projected fixed-point equation that underlies ETD involves a contraction operator, with a $\sqrtγ$-contraction modulus (where $γ$ is the discount factor). This allows us to provide error bounds on the approximation error of ETD. To our knowledge, these are the first error bounds for an off-policy evaluation algorithm under general target and behavior policies.
Penulis (3)
A
Assaf Hallak
A
Aviv Tamar
S
Shie Mannor
Akses Cepat
Informasi Jurnal
- Tahun Terbit
- 2015
- Bahasa
- en
- Sumber Database
- arXiv
- Akses
- Open Access ✓