arXiv Open Access 2023

Online Decision Making with History-Average Dependent Costs (Extended)

Vijeth Hebbar Cedric Langbort

Lihat Sumber

Abstrak

In many online sequential decision-making scenarios, a learner's choices affect not just their current costs but also the future ones. In this work, we look at one particular case of such a situation where the costs depend on the time average of past decisions over a history horizon. We first recast this problem with history dependent costs as a problem of decision making under stage-wise constraints. To tackle this, we then propose the novel Follow-The-Adaptively-Regularized-Leader (FTARL) algorithm. Our innovative algorithm incorporates adaptive regularizers that depend explicitly on past decisions, allowing us to enforce stage-wise constraints while simultaneously enabling us to establish tight regret bounds. We also discuss the implications of the length of history horizon on design of no-regret algorithms for our problem and present impossibility results when it is the full learning horizon.

Topik & Kata Kunci

cs.LG eess.SY

Penulis (2)

Vijeth Hebbar

Cedric Langbort

Format Sitasi

APA MLA BibTeX

Hebbar, V., Langbort, C. (2023). Online Decision Making with History-Average Dependent Costs (Extended). https://arxiv.org/abs/2312.06641

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2023
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓