Semantic Scholar Open Access 2015 75 sitasi

Multi-Objective MDPs with Conditional Lexicographic Reward Preferences

Kyle Hollins Wray S. Zilberstein A. Mouaddib

Lihat Sumber DOI

Abstrak

Sequential decision problems that involve multiple objectives are prevalent. Consider for example a driver of a semi-autonomous car who may want to optimize competing objectives such as travel time and the effort associated with manual driving. We introduce a rich model called Lexicographic MDP (LMDP) and a corresponding planning algorithm called LVI that generalize previous work by allowing for conditional lexicographic preferences with slack. We analyze the convergence characteristics of LVI and establish its game theoretic properties. The performance of LVI in practice is tested within a realistic benchmark problem in the domain of semi-autonomous driving. Finally, we demonstrate how GPU-based optimization can improve the scalability of LVI and other value iteration algorithms for MDPs.

Topik & Kata Kunci

Computer Science

Penulis (3)

Kyle Hollins Wray

S. Zilberstein

A. Mouaddib

Format Sitasi

APA MLA BibTeX

Wray, K.H., Zilberstein, S., Mouaddib, A. (2015). Multi-Objective MDPs with Conditional Lexicographic Reward Preferences. https://doi.org/10.1609/aaai.v29i1.9647

Akses Cepat

Lihat di Sumber doi.org/10.1609/aaai.v29i1.9647

Informasi Jurnal

Tahun Terbit: 2015
Bahasa: en
Total Sitasi: 75×
Sumber Database: Semantic Scholar
DOI: 10.1609/aaai.v29i1.9647
Akses: Open Access ✓