arXiv Open Access 2015

Emphatic TD Bellman Operator is a Contraction

Assaf Hallak Aviv Tamar Shie Mannor
Lihat Sumber

Abstrak

Recently, \citet{SuttonMW15} introduced the emphatic temporal differences (ETD) algorithm for off-policy evaluation in Markov decision processes. In this short note, we show that the projected fixed-point equation that underlies ETD involves a contraction operator, with a $\sqrtγ$-contraction modulus (where $γ$ is the discount factor). This allows us to provide error bounds on the approximation error of ETD. To our knowledge, these are the first error bounds for an off-policy evaluation algorithm under general target and behavior policies.

Topik & Kata Kunci

Penulis (3)

A

Assaf Hallak

A

Aviv Tamar

S

Shie Mannor

Format Sitasi

Hallak, A., Tamar, A., Mannor, S. (2015). Emphatic TD Bellman Operator is a Contraction. https://arxiv.org/abs/1508.03411

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2015
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓