arXiv Open Access 2023

Reinforcement Learning with History-Dependent Dynamic Contexts

Guy Tennenholtz Nadav Merlis Lior Shani Martin Mladenov Craig Boutilier

Lihat Sumber

Abstrak

We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts change over time. We consider special cases of the model, with a focus on logistic DCMDPs, which break the exponential dependence on history length by leveraging aggregation functions to determine context transitions. This special structure allows us to derive an upper-confidence-bound style algorithm for which we establish regret bounds. Motivated by our theoretical results, we introduce a practical model-based algorithm for logistic DCMDPs that plans in a latent space and uses optimism over history-dependent features. We demonstrate the efficacy of our approach on a recommendation task (using MovieLens data) where user behavior dynamics evolve in response to recommendations.

Topik & Kata Kunci

cs.LG cs.AI eess.SY stat.ML

Penulis (5)

Guy Tennenholtz

Nadav Merlis

Lior Shani

Martin Mladenov

Craig Boutilier

Format Sitasi

APA MLA BibTeX

Tennenholtz, G., Merlis, N., Shani, L., Mladenov, M., Boutilier, C. (2023). Reinforcement Learning with History-Dependent Dynamic Contexts. https://arxiv.org/abs/2302.02061

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2023
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓