arXiv Open Access 2024

Catastrophic-risk-aware reinforcement learning with extreme-value-theory-based policy gradients

Parisa Davar Frédéric Godin Jose Garrido

Lihat Sumber

Abstrak

This paper tackles the problem of mitigating catastrophic risk (which is risk with very low frequency but very high severity) in the context of a sequential decision making process. This problem is particularly challenging due to the scarcity of observations in the far tail of the distribution of cumulative costs (negative rewards). A policy gradient algorithm is developed, that we call POTPG. It is based on approximations of the tail risk derived from extreme value theory. Numerical experiments highlight the out-performance of our method over common benchmarks, relying on the empirical distribution. An application to financial risk management, more precisely to the dynamic hedging of a financial option, is presented.

Topik & Kata Kunci

cs.LG q-fin.RM

Penulis (3)

Parisa Davar

Frédéric Godin

Jose Garrido

Format Sitasi

APA MLA BibTeX

Davar, P., Godin, F., Garrido, J. (2024). Catastrophic-risk-aware reinforcement learning with extreme-value-theory-based policy gradients. https://arxiv.org/abs/2406.15612

Akses Cepat

Lihat di Sumber

Informasi Jurnal

Tahun Terbit: 2024
Bahasa: en
Sumber Database: arXiv
Akses: Open Access ✓