arXiv Open Access 2022

On learning history based policies for controlling Markov decision processes

Gandharv Patil Aditya Mahajan Doina Precup
Lihat Sumber

Abstrak

Reinforcementlearning(RL)folkloresuggeststhathistory-basedfunctionapproximationmethods,suchas recurrent neural nets or history-based state abstraction, perform better than their memory-less counterparts, due to the fact that function approximation in Markov decision processes (MDP) can be viewed as inducing a Partially observable MDP. However, there has been little formal analysis of such history-based algorithms, as most existing frameworks focus exclusively on memory-less features. In this paper, we introduce a theoretical framework for studying the behaviour of RL algorithms that learn to control an MDP using history-based feature abstraction mappings. Furthermore, we use this framework to design a practical RL algorithm and we numerically evaluate its effectiveness on a set of continuous control tasks.

Topik & Kata Kunci

Penulis (3)

G

Gandharv Patil

A

Aditya Mahajan

D

Doina Precup

Format Sitasi

Patil, G., Mahajan, A., Precup, D. (2022). On learning history based policies for controlling Markov decision processes. https://arxiv.org/abs/2211.03011

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2022
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓