arXiv Open Access 2025

Myopic Optimality: why reinforcement learning portfolio management strategies lose money

Yuming Ma
Lihat Sumber

Abstrak

Myopic optimization (MO) outperforms reinforcement learning (RL) in portfolio management: RL yields lower or negative returns, higher variance, larger costs, heavier CVaR, lower profitability, and greater model risk. We model execution/liquidation frictions with mark-to-market accounting. Using Malliavin calculus (Clark-Ocone/BEL), we derive policy gradients and risk shadow price, unifying HJB and KKT. This gives dual gap and convergence results: geometric MO vs. RL floors. We quantify phantom profit in RL via Malliavin policy-gradient contamination analysis and define a control-affects-dynamics (CAD) premium of RL indicating plausibly positive.

Penulis (1)

Y

Yuming Ma

Format Sitasi

Ma, Y. (2025). Myopic Optimality: why reinforcement learning portfolio management strategies lose money. https://arxiv.org/abs/2509.12764

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2025
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓