Semantic Scholar Open Access 2021 360 sitasi

Randomized Ensembled Double Q-Learning: Learning Fast Without a Model

Xinyue Chen Che Wang Zijian Zhou K. Ross

Abstrak

Using a high Update-To-Data (UTD) ratio, model-based methods have recently achieved much higher sample efficiency than previous model-free methods for continuous-action DRL benchmarks. In this paper, we introduce a simple model-free algorithm, Randomized Ensembled Double Q-Learning (REDQ), and show that its performance is just as good as, if not better than, a state-of-the-art model-based algorithm for the MuJoCo benchmark. Moreover, REDQ can achieve this performance using fewer parameters than the model-based method, and with less wall-clock run time. REDQ has three carefully integrated ingredients which allow it to achieve its high performance: (i) a UTD ratio>>1; (ii) an ensemble of Q functions; (iii) in-target minimization across a random subset of Q functions from the ensemble. Through carefully designed experiments, we provide a detailed analysis of REDQ and related model-free algorithms. To our knowledge, REDQ is the first successful model-free DRL algorithm for continuous-action spaces using a UTD ratio>>1.

Topik & Kata Kunci

Penulis (4)

X

Xinyue Chen

C

Che Wang

Z

Zijian Zhou

K

K. Ross

Format Sitasi

Chen, X., Wang, C., Zhou, Z., Ross, K. (2021). Randomized Ensembled Double Q-Learning: Learning Fast Without a Model. https://www.semanticscholar.org/paper/736590f70e7f2dc464c1c62491cfa8adb4d718f3

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2021
Bahasa
en
Total Sitasi
360×
Sumber Database
Semantic Scholar
Akses
Open Access ✓