Revenue Management With Nonparametric Demand Learning and Product Returns
Abstrak
Product returns are prevalent in practice. Many retailers provide lenient free return policies but with specific return window within which customers are allowed to return products. Motivated by this phenomenon, we consider a single-product online learning and pricing problem with stochastic product returns. A salient feature is that the demand function, depending on price and return window decisions, is initially unknown and must be learned on the fly. The retailer thus faces the classic exploration–exploitation trade-off. Moreover, we consider an inventory constraint, introducing an additional trade-off between earning revenue and managing inventory. We propose a modeling framework to integrate pricing and return window decisions, and develop a deterministic fluid model that serves as the full-information benchmark. To tackle the learning problem, we design a novel nonparametric learning algorithm that seamlessly integrates inverse stochastic gradient descent (SGD) and Upper Confidence Bound (UCB) methods. Under mild assumptions on demand and revenue functions, we establish a regret upper bound for our learning algorithm as O ( W T log T ) , where W denotes the number of return window candidates and T denotes the time horizon. This result aligns with lower bounds established in both online pricing and multi-armed bandit (MAB) literature. Numerical experiments are conducted to verify the effectiveness and robustness of our algorithm across various environments. From an operational standpoint, retailers can use our learning framework as a decision-support tool to identify the optimal price and return window.
Penulis (2)
Sheng Ji
Yi Yang
Akses Cepat
- Tahun Terbit
- 2026
- Bahasa
- en
- Sumber Database
- CrossRef
- DOI
- 10.1177/10591478261424032
- Akses
- Open Access ✓