arXiv Open Access 2022

Online Reinforcement Learning for Periodic MDP

Ayush Aniket Arpan Chattopadhyay

Lihat Sumber

Abstrak

We study learning in periodic Markov Decision Process(MDP), a special type of non-stationary MDP where both the state transition probabilities and reward functions vary periodically, under the average reward maximization setting. We formulate the problem as a stationary MDP by augmenting the state space with the period index, and propose a periodic upper confidence bound reinforcement learning-2 (PUCRL2) algorithm. We show that the regret of PUCRL2 varies linearly with the period and as sub-linear with the horizon length. Numerical results demonstrate the efficacy of PUCRL2.

Topik & Kata Kunci

cs.LG

Penulis (2)

Ayush Aniket

Arpan Chattopadhyay

Format Sitasi

APA MLA BibTeX

Aniket, A., Chattopadhyay, A. (2022). Online Reinforcement Learning for Periodic MDP. https://arxiv.org/abs/2207.12045

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2022
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓