Semantic Scholar Open Access 2023 253 sitasi

IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies

Philippe Hansen-Estruch Ilya Kostrikov Michael Janner J. Kuba S. Levine

Lihat Sumber DOI

Abstrak

Effective offline RL methods require properly handling out-of-distribution actions. Implicit Q-learning (IQL) addresses this by training a Q-function using only dataset actions through a modified Bellman backup. However, it is unclear which policy actually attains the values represented by this implicitly trained Q-function. In this paper, we reinterpret IQL as an actor-critic method by generalizing the critic objective and connecting it to a behavior-regularized implicit actor. This generalization shows how the induced actor balances reward maximization and divergence from the behavior policy, with the specific loss choice determining the nature of this tradeoff. Notably, this actor can exhibit complex and multimodal characteristics, suggesting issues with the conditional Gaussian actor fit with advantage weighted regression (AWR) used in prior methods. Instead, we propose using samples from a diffusion parameterized behavior policy and weights computed from the critic to then importance sampled our intended policy. We introduce Implicit Diffusion Q-learning (IDQL), combining our general IQL critic with the policy extraction method. IDQL maintains the ease of implementation of IQL while outperforming prior offline RL methods and demonstrating robustness to hyperparameters. Code is available at https://github.com/philippe-eecs/IDQL.

Topik & Kata Kunci

Computer Science

Penulis (5)

Philippe Hansen-Estruch

Ilya Kostrikov

Michael Janner

J. Kuba

S. Levine

Format Sitasi

APA MLA BibTeX

Hansen-Estruch, P., Kostrikov, I., Janner, M., Kuba, J., Levine, S. (2023). IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies. https://doi.org/10.48550/arXiv.2304.10573

Akses Cepat

Lihat di Sumber doi.org/10.48550/arXiv.2304.10573

Informasi Jurnal

Tahun Terbit: 2023
Bahasa: en
Total Sitasi: 253×
Sumber Database: Semantic Scholar
DOI: 10.48550/arXiv.2304.10573
Akses: Open Access ✓