Option-Critic Algorithm Based on Mutual Information Optimization
Abstrak
As an important research content of hierarchical reinforcement learning,temporal abstraction allows hierarchical reinforcement learning agents to learn policies at different time scales,which can effectively solve the sparse reward problem that is difficult to deal with in deep reinforcement learning.How to learn excellent temporal abstraction policy end-to-end is always a research challenge in hierarchical reinforcement learning.Based on the Option framework,Option-Critic can effectively solve the above problems through policy gradient theory.However,in the process of policy learning,the OC framework will have the degradation problem that the action distribution of the internal option policies becomes very similar.This degradation problem affects the experimental performance of the OC framework and leads to poor interpretability of the Option.In order to solve the above problems,mutual information knowledge is introduced as the internal reward,and an Option-Critic algorithm with mutual information optimization is proposed.The MIOOC algorithm combines the proximal policy Option-Critic algorithm to ensure the diversity of the lower level policies.In order to verify the effectiveness of the algorithm,the MIOOC algorithm is compared with several common reinforcement learning methods in continuous experimental environments.Experimental results show that the MIOOC algorithm can speed up the learning speed of the model,improve its experimental performance,and its Option internal strategy is more discriminative.
Topik & Kata Kunci
Penulis (1)
LI Junwei, LIU Quan, XU Yapeng
Akses Cepat
- Tahun Terbit
- 2024
- Sumber Database
- DOAJ
- DOI
- 10.11896/jsjkx.221100019
- Akses
- Open Access ✓