arXiv Open Access 2020

Convergence results for an averaged LQR problem with applications to reinforcement learning

Andrea Pesare Michele Palladino Maurizio Falcone

Lihat Sumber

Abstrak

In this paper, we will deal with a Linear Quadratic Optimal Control problem with unknown dynamics. As a modeling assumption, we will suppose that the knowledge that an agent has on the current system is represented by a probability distribution $π$ on the space of matrices. Furthermore, we will assume that such a probability measure is opportunely updated to take into account the increased experience that the agent obtains while exploring the environment, approximating with increasing accuracy the underlying dynamics. Under these assumptions, we will show that the optimal control obtained by solving the "average" Linear Quadratic Optimal Control problem with respect to a certain $π$ converges to the optimal control driven related to the Linear Quadratic Optimal Control problem governed by the actual, underlying dynamics. This approach is closely related to model-based Reinforcement Learning algorithms where prior and posterior probability distributions describing the knowledge on the uncertain system are recursively updated. In the last section, we will show a numerical test that confirms the theoretical results.

Topik & Kata Kunci

math.OC

Penulis (3)

Andrea Pesare

Michele Palladino

Maurizio Falcone

Format Sitasi

APA MLA BibTeX

Pesare, A., Palladino, M., Falcone, M. (2020). Convergence results for an averaged LQR problem with applications to reinforcement learning. https://arxiv.org/abs/2011.03447

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2020
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓