论文标题
在价格保护保证下学习和赚钱方面的阶段过渡
Phase Transitions in Learning and Earning under Price Protection Guarantee
论文作者
论文摘要
受到``价格保护保证''的普遍性的激励,该公司在所谓的价格保护期间(通常将其定义为购买日期后的某个时间窗口定义为某个时间窗口),以防卖方决定降低价格降低价格,我们研究了在网上学习algorith nectivaliven dynamsig pricient pricikic pricikic priciking priciking priciking pricike,``价格保护保证''允许过去购买产品的顾客。在此设置中,销售产品的时间是$ t $的时间,我们表征了$ m $的价值,价格的长度可以影响学习过程的最佳遗憾。然后,我们提出了一个实例。具体来说,与没有价格保护的经典设置相比,最佳的遗憾没有重大差异。
Motivated by the prevalence of ``price protection guarantee", which allows a customer who purchased a product in the past to receive a refund from the seller during the so-called price protection period (typically defined as a certain time window after the purchase date) in case the seller decides to lower the price, we study the impact of such policy on the design of online learning algorithm for data-driven dynamic pricing with initially unknown customer demand. We consider a setting where a firm sells a product over a horizon of $T$ time steps. For this setting, we characterize how the value of $M$, the length of price protection period, can affect the optimal regret of the learning process. We show that the optimal regret is $\tildeΘ(\sqrt{T}+\min\{M,\,T^{2/3}\})$ by first establishing a fundamental impossible regime with novel regret lower bound instances. Then, we propose LEAP, a phased exploration type algorithm for \underline{L}earning and \underline{EA}rning under \underline{P}rice Protection to match this lower bound up to logarithmic factors or even doubly logarithmic factors (when there are only two prices available to the seller). Our results reveal the surprising phase transitions of the optimal regret with respect to $M$. Specifically, when $M$ is not too large, the optimal regret has no major difference when compared to that of the classic setting with no price protection guarantee. We also show that there exists an upper limit on how much the optimal regret can deteriorate when $M$ grows large. Finally, we conduct extensive numerical experiments to show the benefit of LEAP over other heuristic methods for this problem.