Semantic Scholar Open Access 2021 12 sitasi

Generative Adversarial Reward Learning for Generalized Behavior Tendency Inference

Xiaocong Chen Lina Yao Xianzhi Wang Aixin Sun Wenjie Zhang +1 lainnya

Abstrak

Recent advances in reinforcement learning have inspired increasing interest in learning user modeling adaptively through dynamic interactions, e.g., in reinforcement learning based recommender systems. In most reinforcement learning applications, reward functions provide the critical guideline for optimization. However, current reinforcement learning-based methods rely on manually-defined reward functions, which cannot adapt to dynamic, noisy environments. Moreover, they generally use task-specific reward functions that sacrifice generalization ability. We propose a generative inverse reinforcement learning for user behavioral preference modeling to address the above issues. Instead of using predefined reward functions, our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN. Our model provides a general approach to characterizing and explaining underlying behavioral tendencies. Our experiments show our method outperforms state-of-the-art methods in several scenarios, namely traffic signal control, online recommender systems, and scanpath prediction.

Topik & Kata Kunci

Penulis (6)

X

Xiaocong Chen

L

Lina Yao

X

Xianzhi Wang

A

Aixin Sun

W

Wenjie Zhang

Q

Quan Z. Sheng

Format Sitasi

Chen, X., Yao, L., Wang, X., Sun, A., Zhang, W., Sheng, Q.Z. (2021). Generative Adversarial Reward Learning for Generalized Behavior Tendency Inference. https://doi.org/10.1109/TKDE.2022.3186920

Akses Cepat

Lihat di Sumber doi.org/10.1109/TKDE.2022.3186920
Informasi Jurnal
Tahun Terbit
2021
Bahasa
en
Total Sitasi
12×
Sumber Database
Semantic Scholar
DOI
10.1109/TKDE.2022.3186920
Akses
Open Access ✓