Semantic Scholar Open Access 2021 360 sitasi

Randomized Ensembled Double Q-Learning: Learning Fast Without a Model

Xinyue Chen Che Wang Zijian Zhou K. Ross

Lihat Sumber

Abstrak

Using a high Update-To-Data (UTD) ratio, model-based methods have recently achieved much higher sample efficiency than previous model-free methods for continuous-action DRL benchmarks. In this paper, we introduce a simple model-free algorithm, Randomized Ensembled Double Q-Learning (REDQ), and show that its performance is just as good as, if not better than, a state-of-the-art model-based algorithm for the MuJoCo benchmark. Moreover, REDQ can achieve this performance using fewer parameters than the model-based method, and with less wall-clock run time. REDQ has three carefully integrated ingredients which allow it to achieve its high performance: (i) a UTD ratio>>1; (ii) an ensemble of Q functions; (iii) in-target minimization across a random subset of Q functions from the ensemble. Through carefully designed experiments, we provide a detailed analysis of REDQ and related model-free algorithms. To our knowledge, REDQ is the first successful model-free DRL algorithm for continuous-action spaces using a UTD ratio>>1.

Topik & Kata Kunci

Computer Science

Penulis (4)

Xinyue Chen

Che Wang

Zijian Zhou

K. Ross

Format Sitasi

APA MLA BibTeX

Chen, X., Wang, C., Zhou, Z., Ross, K. (2021). Randomized Ensembled Double Q-Learning: Learning Fast Without a Model. https://www.semanticscholar.org/paper/736590f70e7f2dc464c1c62491cfa8adb4d718f3

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2021
Bahasa: en
Total Sitasi: 360×
Sumber Database: Semantic Scholar
Akses: Open Access ✓