将资源预算交易以改善后悔界限

论文标题

将资源预算交易以改善后悔界限

Trading Off Resource Budgets for Improved Regret Bounds

论文作者

Falck, Damon, Orton, Thomas

论文摘要

在这项工作中，我们考虑了一种对抗性在线学习的一种变体，在每个回合中，每一轮挑选$ n $武器的$ b $，并且成本等于$ \ textit {最低{最低} $的费用。我们提出了一种算法，称为“遵循扰动的多个领导者（FPML）有关此问题的问题），我们显示（通过调整Kalai和Vempala [2005]的技术）实现了预期的遗憾。 $ \ MATHCAL {o}（t^{\ frac {1} {b+1}} \ ln（n）^{\ frac {\ frac {b} {b+1}} $ time horizon $ t $相对于$ \ textit {single} $ the hindsight的$ \ textit {single} $最佳武器。这引入了预算$ B $与单最佳武器遗憾之间的权衡，我们开始调查此权衡的几项申请。首先，我们观察到，使用标准遗憾的最小化器作为子例程的算法有时可以通过用FPML替换这些子例程来调整这些子例程，并且我们将其用于在线函数最大化现有算法[Streeter和Golovin，2008，2008]中，包括全面反馈和半杂种反馈的设置。接下来，我们在在线黑盒高参数优化问题上凭经验评估了新算法。最后，我们展示了FPML如何导致线性编程的新算法，该算法需要更强大的甲骨文，以减少Oracle调用。

In this work we consider a variant of adversarial online learning where in each round one picks $B$ out of $N$ arms and incurs cost equal to the $\textit{minimum}$ of the costs of each arm chosen. We propose an algorithm called Follow the Perturbed Multiple Leaders (FPML) for this problem, which we show (by adapting the techniques of Kalai and Vempala [2005]) achieves expected regret $\mathcal{O}(T^{\frac{1}{B+1}}\ln(N)^{\frac{B}{B+1}})$ over time horizon $T$ relative to the $\textit{single}$ best arm in hindsight. This introduces a trade-off between the budget $B$ and the single-best-arm regret, and we proceed to investigate several applications of this trade-off. First, we observe that algorithms which use standard regret minimizers as subroutines can sometimes be adapted by replacing these subroutines with FPML, and we use this to generalize existing algorithms for Online Submodular Function Maximization [Streeter and Golovin, 2008] in both the full feedback and semi-bandit feedback settings. Next, we empirically evaluate our new algorithms on an online black-box hyperparameter optimization problem. Finally, we show how FPML can lead to new algorithms for Linear Programming which require stronger oracles at the benefit of fewer oracle calls.

下载PDF全文

下载文献需遵守相关版权规定

论文标题