Generative Adversarial Reward Learning for Generalized Behavior Tendency Inference
Abstrak
Recent advances in reinforcement learning have inspired increasing interest in learning user modeling adaptively through dynamic interactions, e.g., in reinforcement learning based recommender systems. In most reinforcement learning applications, reward functions provide the critical guideline for optimization. However, current reinforcement learning-based methods rely on manually-defined reward functions, which cannot adapt to dynamic, noisy environments. Moreover, they generally use task-specific reward functions that sacrifice generalization ability. We propose a generative inverse reinforcement learning for user behavioral preference modeling to address the above issues. Instead of using predefined reward functions, our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN. Our model provides a general approach to characterizing and explaining underlying behavioral tendencies. Our experiments show our method outperforms state-of-the-art methods in several scenarios, namely traffic signal control, online recommender systems, and scanpath prediction.
Topik & Kata Kunci
Penulis (6)
Xiaocong Chen
Lina Yao
Xianzhi Wang
Aixin Sun
Wenjie Zhang
Quan Z. Sheng
Akses Cepat
- Tahun Terbit
- 2021
- Bahasa
- en
- Total Sitasi
- 12×
- Sumber Database
- Semantic Scholar
- DOI
- 10.1109/TKDE.2022.3186920
- Akses
- Open Access ✓