arXiv Open Access 2023

Reinforcement Learning with History-Dependent Dynamic Contexts

Guy Tennenholtz Nadav Merlis Lior Shani Martin Mladenov Craig Boutilier
Lihat Sumber

Abstrak

We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts change over time. We consider special cases of the model, with a focus on logistic DCMDPs, which break the exponential dependence on history length by leveraging aggregation functions to determine context transitions. This special structure allows us to derive an upper-confidence-bound style algorithm for which we establish regret bounds. Motivated by our theoretical results, we introduce a practical model-based algorithm for logistic DCMDPs that plans in a latent space and uses optimism over history-dependent features. We demonstrate the efficacy of our approach on a recommendation task (using MovieLens data) where user behavior dynamics evolve in response to recommendations.

Penulis (5)

G

Guy Tennenholtz

N

Nadav Merlis

L

Lior Shani

M

Martin Mladenov

C

Craig Boutilier

Format Sitasi

Tennenholtz, G., Merlis, N., Shani, L., Mladenov, M., Boutilier, C. (2023). Reinforcement Learning with History-Dependent Dynamic Contexts. https://arxiv.org/abs/2302.02061

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2023
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓