论文标题
通过回归甲骨文可实现的最佳上下文匪徒,带有背包。
Optimal Contextual Bandits with Knapsacks under Realizability via Regression Oracles
论文作者
论文摘要
我们研究了带有背包(CBWK)问题的随机上下文匪徒,在这种情况下,每个动作都在上下文下,不仅会带来随机的奖励,而且还会以向量形式付出随机的资源消耗。面临的挑战是在不违反每个资源的预算的情况下最大化总奖励。我们在一般可靠性设置下研究了此问题,其中预期的奖励和预期成本是某些给定的通用功能类中上下文和动作的函数$ \ MATHCAL {F} $和$ \ MATHCAL {G} $。 CBWK上的现有作品仅限于线性函数类,因为它们使用了UCB型算法,该算法在很大程度上依赖于线性形式,因此很难扩展到一般函数类。通过成功应用于上下文匪徒的在线回归座的动机,我们通过将其减少到在线回归中,为CBWK提出了第一个通用和最佳的算法框架。我们还建立了较低的遗憾,以显示各种功能类别的算法的最佳性。
We study the stochastic contextual bandit with knapsacks (CBwK) problem, where each action, taken upon a context, not only leads to a random reward but also costs a random resource consumption in a vector form. The challenge is to maximize the total reward without violating the budget for each resource. We study this problem under a general realizability setting where the expected reward and expected cost are functions of contexts and actions in some given general function classes $\mathcal{F}$ and $\mathcal{G}$, respectively. Existing works on CBwK are restricted to the linear function class since they use UCB-type algorithms, which heavily rely on the linear form and thus are difficult to extend to general function classes. Motivated by online regression oracles that have been successfully applied to contextual bandits, we propose the first universal and optimal algorithmic framework for CBwK by reducing it to online regression. We also establish the lower regret bound to show the optimality of our algorithm for a variety of function classes.