The Value of Blending—Managing Ameliorating Inventory Using Deep Reinforcement Learning
Abstrak
Stocks of some food products, such as whiskey, cheese, or port wine, ameliorate during storage, facilitating product differentiation according to age. This induces a trade-off between immediate revenues and further maturation. Inventory management decisions include purchasing volumes of agricultural produce and production volumes for age-differentiated products. Because products can be blended from stocks of different ages, issuance decisions offer operational flexibility. However, whereas some industries (port wine, sherry) only request that the product labels refer to the average age of issued stocks, others (whiskey, rum) have stricter blending regulations, requiring that the product labels represent the minimum age of all components. Further, producers must deal with multiple uncertainties. Purchase prices of agricultural commodities depend on volatile climate-dependent harvest seasons, stocks decay during maturation, and sales market conditions fluctuate. We solve this inventory management problem using a deep reinforcement learning algorithm with three key innovations: (i) A novel actor pipeline that decomposes the action space and flexibly partitions decision dimensions between a neural network and a lookahead optimization model, (ii) an algorithm explicitly maximizing average rewards, and (iii) reward-handling techniques that exploit structural problem insights. Our approach yields near-optimal policies that consistently outperform benchmark heuristics. Beyond the algorithmic contributions, our results offer new managerial insights into the value of blending under uncertainty. Minimum-age blending substantially enhances the profits of firms as compared to no blending because companies can adjust their purchasing policy in response to price fluctuations. The more flexible average-age regime further improves profits by 8.7 % on average, suggesting that whiskey and rum regulators may wish to reconsider their strict blending rules. We mine black-box policies from deep reinforcement learning using supervised machine learning and Shapley values to analyze near-optimal decision drivers. Exploiting the value of blending requires producers to install sufficient processing capacity, especially when dealing with large variations in harvest seasons. Additionally, blending entails increased planning complexity because the inventory management decisions are driven by a large number of factors.
Penulis (2)
Alexander Pahr
Martin Grunow
Akses Cepat
- Tahun Terbit
- 2025
- Bahasa
- en
- Sumber Database
- CrossRef
- DOI
- 10.1177/10591478251387795
- Akses
- Open Access ✓