与切换成本的匪徒，两全其美的范围更好

论文标题

与切换成本的匪徒，两全其美的范围更好

Better Best of Both Worlds Bounds for Bandits with Switching Costs

论文作者

Amir, Idan, Azov, Guy, Koren, Tomer, Livni, Roi

论文摘要

我们研究了带有切换成本的土匪的最佳世界算法，最近由Rouyer，Seldin和Cesa-Bianchi提到，2021年。我们介绍了一种令人惊讶的简单有效的算法，同时实现了最佳的最佳遗憾，以$ \ Mathcal {O}（O}（O}（t^^2/3}）） $ \ MATHCAL {O}（\ min \ {\ log（t）/δ^2，T^{2/3} \}）$在随机约束的政权中，均具有（单位）切换成本，其中$δ$是$Δ$，其中$Δ$是手臂之间的差距。在随机限制的情况下，由于Rouyer等人，我们的界限比以前的结果改善了，这使$ \ Mathcal {o}（t^{1/3}/δ）$感到遗憾。我们的结果伴随着一个下限，通常表明，$ \tildeΩ（\ min \ {1/δ^2，t^{2/3} \}）$遗憾在随机限制的算法中不可避免地用于算法，用于使用$ \ \ \ \ \ \ \ \ \ \ \ \ \ \ {o}（t^o}（t^^{2/2/3} $ wort）。

We study best-of-both-worlds algorithms for bandits with switching cost, recently addressed by Rouyer, Seldin and Cesa-Bianchi, 2021. We introduce a surprisingly simple and effective algorithm that simultaneously achieves minimax optimal regret bound of $\mathcal{O}(T^{2/3})$ in the oblivious adversarial setting and a bound of $\mathcal{O}(\min\{\log (T)/Δ^2,T^{2/3}\})$ in the stochastically-constrained regime, both with (unit) switching costs, where $Δ$ is the gap between the arms. In the stochastically constrained case, our bound improves over previous results due to Rouyer et al., that achieved regret of $\mathcal{O}(T^{1/3}/Δ)$. We accompany our results with a lower bound showing that, in general, $\tildeΩ(\min\{1/Δ^2,T^{2/3}\})$ regret is unavoidable in the stochastically-constrained case for algorithms with $\mathcal{O}(T^{2/3})$ worst-case regret.

下载PDF全文

下载文献需遵守相关版权规定

论文标题