arXiv Open Access 2025

Rethinking the Foundations for Continual Reinforcement Learning

Esraa Elelimy David Szepesvari Martha White Michael Bowling
Lihat Sumber

Abstrak

In the traditional view of reinforcement learning, the agent's goal is to find an optimal policy that maximizes its expected sum of rewards. Once the agent finds this policy, the learning ends. This view contrasts with \emph{continual reinforcement learning}, where learning does not end, and agents are expected to continually learn and adapt indefinitely. Despite the clear distinction between these two paradigms of learning, much of the progress in continual reinforcement learning has been shaped by foundations rooted in the traditional view of reinforcement learning. In this paper, we first examine whether the foundations of traditional reinforcement learning are suitable for the continual reinforcement learning paradigm. We identify four key pillars of the traditional reinforcement learning foundations that are antithetical to the goals of continual learning: the Markov decision process formalism, the focus on atemporal artifacts, the expected sum of rewards as an evaluation metric, and episodic benchmark environments that embrace the other three foundations. We then propose a new formalism that sheds the first and the third foundations and replaces them with the history process as a mathematical formalism and a new definition of deviation regret, adapted for continual learning, as an evaluation metric. Finally, we discuss possible approaches to shed the other two foundations.

Topik & Kata Kunci

Penulis (4)

E

Esraa Elelimy

D

David Szepesvari

M

Martha White

M

Michael Bowling

Format Sitasi

Elelimy, E., Szepesvari, D., White, M., Bowling, M. (2025). Rethinking the Foundations for Continual Reinforcement Learning. https://arxiv.org/abs/2504.08161

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓