DOAJ Open Access 2024

Option-Critic Algorithm Based on Mutual Information Optimization

LI Junwei, LIU Quan, XU Yapeng

Lihat Sumber DOI

Abstrak

As an important research content of hierarchical reinforcement learning,temporal abstraction allows hierarchical reinforcement learning agents to learn policies at different time scales,which can effectively solve the sparse reward problem that is difficult to deal with in deep reinforcement learning.How to learn excellent temporal abstraction policy end-to-end is always a research challenge in hierarchical reinforcement learning.Based on the Option framework,Option-Critic can effectively solve the above problems through policy gradient theory.However,in the process of policy learning,the OC framework will have the degradation problem that the action distribution of the internal option policies becomes very similar.This degradation problem affects the experimental performance of the OC framework and leads to poor interpretability of the Option.In order to solve the above problems,mutual information knowledge is introduced as the internal reward,and an Option-Critic algorithm with mutual information optimization is proposed.The MIOOC algorithm combines the proximal policy Option-Critic algorithm to ensure the diversity of the lower level policies.In order to verify the effectiveness of the algorithm,the MIOOC algorithm is compared with several common reinforcement learning methods in continuous experimental environments.Experimental results show that the MIOOC algorithm can speed up the learning speed of the model,improve its experimental performance,and its Option internal strategy is more discriminative.

Topik & Kata Kunci

Computer software Technology (General)

Penulis (1)

LI Junwei, LIU Quan, XU Yapeng

Format Sitasi

APA MLA BibTeX

Yapeng, L.J.L.Q.X. (2024). Option-Critic Algorithm Based on Mutual Information Optimization. https://doi.org/10.11896/jsjkx.221100019

Akses Cepat

Lihat di Sumber doi.org/10.11896/jsjkx.221100019

Informasi Jurnal

Tahun Terbit: 2024
Sumber Database: DOAJ
DOI: 10.11896/jsjkx.221100019
Akses: Open Access ✓