合作和随机多人多军强盗：既不通信也不相撞的最佳遗憾

论文标题

合作和随机多人多军强盗：既不通信也不相撞的最佳遗憾

Cooperative and Stochastic Multi-Player Multi-Armed Bandit: Optimal Regret With Neither Communication Nor Collisions

论文作者

Bubeck, Sébastien, Budzinski, Thomas, Sellke, Mark

论文摘要

我们考虑了随机多臂匪徒问题的合作多人版本。我们研究玩家无法交流但可以访问共享随机性的制度。在前两位作者的先前工作中，为两个球员和三个武器构建了该制度的策略，遗憾的是$ \ tilde {o}（\ sqrt {t}）$，并且在玩家之间完全没有碰撞（具有很高的可能性）。在本文中，我们表明这些属性（几乎是最佳的遗憾和根本没有冲突）对于任何数量的玩家和武器都可以实现。在高水平上，以前的策略在很大程度上依赖于$ 2 $维的几何直觉，在较高的维度上很难概括，而在这里，我们采取了更加组合的路线来构建新策略。

We consider the cooperative multi-player version of the stochastic multi-armed bandit problem. We study the regime where the players cannot communicate but have access to shared randomness. In prior work by the first two authors, a strategy for this regime was constructed for two players and three arms, with regret $\tilde{O}(\sqrt{T})$, and with no collisions at all between the players (with very high probability). In this paper we show that these properties (near-optimal regret and no collisions at all) are achievable for any number of players and arms. At a high level, the previous strategy heavily relied on a $2$-dimensional geometric intuition that was difficult to generalize in higher dimensions, while here we take a more combinatorial route to build the new strategy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题