Semantic Scholar Open Access 2019 162 sitasi

Meta-Q-Learning

Rasool Fakoor P. Chaudhari Stefano Soatto Alex Smola

Lihat Sumber

Abstrak

This paper introduces Meta-Q-Learning (MQL), a new off-policy algorithm for meta-Reinforcement Learning (meta-RL). MQL builds upon three simple ideas. First, we show that Q-learning is competitive with state-of-the-art meta-RL algorithms if given access to a context variable that is a representation of the past trajectory. Second, a multi-task objective to maximize the average reward across the training tasks is an effective method to meta-train RL policies. Third, past data from the meta-training replay buffer can be recycled to adapt the policy on a new task using off-policy updates. MQL draws upon ideas in propensity estimation to do so and thereby amplifies the amount of available data for adaptation. Experiments on standard continuous-control benchmarks suggest that MQL compares favorably with the state of the art in meta-RL.

Topik & Kata Kunci

Computer Science Mathematics

Penulis (4)

Rasool Fakoor

P. Chaudhari

Stefano Soatto

Alex Smola

Format Sitasi

APA MLA BibTeX

Fakoor, R., Chaudhari, P., Soatto, S., Smola, A. (2019). Meta-Q-Learning. https://www.semanticscholar.org/paper/405d2cc9cccc035b4c8d950d6cfab4cc1a5d0628

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2019
Bahasa: en
Total Sitasi: 162×
Sumber Database: Semantic Scholar
Akses: Open Access ✓