arXiv
Open Access
2022
Online Reinforcement Learning for Periodic MDP
Ayush Aniket
Arpan Chattopadhyay
Abstrak
We study learning in periodic Markov Decision Process(MDP), a special type of non-stationary MDP where both the state transition probabilities and reward functions vary periodically, under the average reward maximization setting. We formulate the problem as a stationary MDP by augmenting the state space with the period index, and propose a periodic upper confidence bound reinforcement learning-2 (PUCRL2) algorithm. We show that the regret of PUCRL2 varies linearly with the period and as sub-linear with the horizon length. Numerical results demonstrate the efficacy of PUCRL2.
Topik & Kata Kunci
Penulis (2)
A
Ayush Aniket
A
Arpan Chattopadhyay
Akses Cepat
Informasi Jurnal
- Tahun Terbit
- 2022
- Bahasa
- en
- Sumber Database
- arXiv
- Akses
- Open Access ✓