DOAJ Open Access 2024

Option-Critic Algorithm Based on Mutual Information Optimization

LI Junwei, LIU Quan, XU Yapeng

Abstrak

As an important research content of hierarchical reinforcement learning,temporal abstraction allows hierarchical reinforcement learning agents to learn policies at different time scales,which can effectively solve the sparse reward problem that is difficult to deal with in deep reinforcement learning.How to learn excellent temporal abstraction policy end-to-end is always a research challenge in hierarchical reinforcement learning.Based on the Option framework,Option-Critic can effectively solve the above problems through policy gradient theory.However,in the process of policy learning,the OC framework will have the degradation problem that the action distribution of the internal option policies becomes very similar.This degradation problem affects the experimental performance of the OC framework and leads to poor interpretability of the Option.In order to solve the above problems,mutual information knowledge is introduced as the internal reward,and an Option-Critic algorithm with mutual information optimization is proposed.The MIOOC algorithm combines the proximal policy Option-Critic algorithm to ensure the diversity of the lower level policies.In order to verify the effectiveness of the algorithm,the MIOOC algorithm is compared with several common reinforcement learning methods in continuous experimental environments.Experimental results show that the MIOOC algorithm can speed up the learning speed of the model,improve its experimental performance,and its Option internal strategy is more discriminative.

Penulis (1)

L

LI Junwei, LIU Quan, XU Yapeng

Format Sitasi

Yapeng, L.J.L.Q.X. (2024). Option-Critic Algorithm Based on Mutual Information Optimization. https://doi.org/10.11896/jsjkx.221100019

Akses Cepat

Lihat di Sumber doi.org/10.11896/jsjkx.221100019
Informasi Jurnal
Tahun Terbit
2024
Sumber Database
DOAJ
DOI
10.11896/jsjkx.221100019
Akses
Open Access ✓