用矢量成本和土匪与背包的在线学习

论文标题

用矢量成本和土匪与背包的在线学习

Online Learning with Vector Costs and Bandits with Knapsacks

论文作者

Kesselheim, Thomas, Singla, Sahil

论文摘要

我们介绍了以矢量成本（\ olvcp）的在线学习，其中每次步骤$ t \ in \ {1，\ ldots，t \} $，我们需要播放一个action $ i \ in \ {1，\ ldots，n \} $，这使vector成本不知名，这是$ [0,1]^n in $ [0,1]^{d} $^{d} $^{d} $。在线算法的目标是最大程度地减少其成本向量总和的$ \ ell_p $规范。这捕获了$ d = 1 $的经典在线学习设置，并且对于一般$ d $来说很有趣，因为在线调度等应用程序中，我们想在其中平衡不同机器之间的负载（尺寸）。我们在随机和对抗性到达设置中研究\ olvCP，并给出一般的程序，将问题从$ d $尺寸减少到单个维度。这使我们能够在完整的反馈模型和匪徒反馈模型中使用经典的在线学习算法来获得（接近）最佳结果。特别是，我们获得了一种单个算法（直至学习率的选择），这给随机来临和一个紧密的$ O（\ min \ {p，\ log d \}）$竞争比率均为对抗性到达的竞争比。 \ olVCP问题在试图用背包（\ bwk）问题解决流行的土匪时，也是自然的子问题。这种连接使我们能够使用\ olVCP技术在随机设置和对抗性设置中为\ bwk获得（接近）最佳结果。特别是，我们获得了一个紧密的$ o（\ log d \ cdot \ log t）$竞争比算法的对抗性\ bwk，它改进了Immorlica等人的$ O（d \ cdot \ log t）$竞争比率算法。 [焦点19]。

We introduce online learning with vector costs (\OLVCp) where in each time step $t \in \{1,\ldots, T\}$, we need to play an action $i \in \{1,\ldots,n\}$ that incurs an unknown vector cost in $[0,1]^{d}$. The goal of the online algorithm is to minimize the $\ell_p$ norm of the sum of its cost vectors. This captures the classical online learning setting for $d=1$, and is interesting for general $d$ because of applications like online scheduling where we want to balance the load between different machines (dimensions). We study \OLVCp in both stochastic and adversarial arrival settings, and give a general procedure to reduce the problem from $d$ dimensions to a single dimension. This allows us to use classical online learning algorithms in both full and bandit feedback models to obtain (near) optimal results. In particular, we obtain a single algorithm (up to the choice of learning rate) that gives sublinear regret for stochastic arrivals and a tight $O(\min\{p, \log d\})$ competitive ratio for adversarial arrivals. The \OLVCp problem also occurs as a natural subproblem when trying to solve the popular Bandits with Knapsacks (\BwK) problem. This connection allows us to use our \OLVCp techniques to obtain (near) optimal results for \BwK in both stochastic and adversarial settings. In particular, we obtain a tight $O(\log d \cdot \log T)$ competitive ratio algorithm for adversarial \BwK, which improves over the $O(d \cdot \log T)$ competitive ratio algorithm of Immorlica et al. [FOCS'19].

下载PDF全文

下载文献需遵守相关版权规定

论文标题