论文标题
学会为学习零售商定价供应链合同
Learning to Price Supply Chain Contracts against a Learning Retailer
论文作者
论文摘要
大数据分析的兴起已经使公司的决策自动化并提高了供应链敏捷性。在本文中,我们研究了数据驱动的供应商面临的供应链合同设计问题,他需要回应下游零售商的库存决策。供应商和零售商都不确定市场需求,需要依次学习它。供应商的目的是制定数据驱动的定价策略,并在固定时间范围内以各种可能的零售商库存策略的范围内的均值后悔界限。 为了捕获零售商的学习政策引起的动态,我们首先通过遵循变化预算的概念与非平稳在线学习建立联系。变化预算量化了零售商的学习策略对供应商决策的影响。然后,我们针对供应商提出动态定价政策,以解决离散和持续需求。我们还注意到,我们提出的定价政策仅需要访问需求分配的支持,但批判性地不需要供应商对零售商的学习政策或需求实现有任何先验知识。我们研究了零售商的几种知名数据驱动的策略,包括样本平均近似,分布强劲的优化和参数方法,并表明我们的定价策略在所有这些情况下都会导致透明度的后悔界限。 在管理层面上,我们肯定地回答有一项定价政策,即使她面临学习零售商和未知的需求分配,也有统一的遗憾。我们的工作还为数据驱动的操作管理提供了一种新颖的看法,在该操作管理中,校长必须学会对系统中其他代理使用的学习政策做出反应。
The rise of big data analytics has automated the decision-making of companies and increased supply chain agility. In this paper, we study the supply chain contract design problem faced by a data-driven supplier who needs to respond to the inventory decisions of the downstream retailer. Both the supplier and the retailer are uncertain about the market demand and need to learn about it sequentially. The goal for the supplier is to develop data-driven pricing policies with sublinear regret bounds under a wide range of possible retailer inventory policies for a fixed time horizon. To capture the dynamics induced by the retailer's learning policy, we first make a connection to non-stationary online learning by following the notion of variation budget. The variation budget quantifies the impact of the retailer's learning strategy on the supplier's decision-making. We then propose dynamic pricing policies for the supplier for both discrete and continuous demand. We also note that our proposed pricing policy only requires access to the support of the demand distribution, but critically, does not require the supplier to have any prior knowledge about the retailer's learning policy or the demand realizations. We examine several well-known data-driven policies for the retailer, including sample average approximation, distributionally robust optimization, and parametric approaches, and show that our pricing policies lead to sublinear regret bounds in all these cases. At the managerial level, we answer affirmatively that there is a pricing policy with a sublinear regret bound under a wide range of retailer's learning policies, even though she faces a learning retailer and an unknown demand distribution. Our work also provides a novel perspective in data-driven operations management where the principal has to learn to react to the learning policies employed by other agents in the system.