论文标题

合作和随机多人多军强盗:既不通信也不相撞的最佳遗憾

Cooperative and Stochastic Multi-Player Multi-Armed Bandit: Optimal Regret With Neither Communication Nor Collisions

论文作者

Bubeck, Sébastien, Budzinski, Thomas, Sellke, Mark

论文摘要

我们考虑了随机多臂匪徒问题的合作多人版本。我们研究玩家无法交流但可以访问共享随机性的制度。在前两位作者的先前工作中,为两个球员和三个武器构建了该制度的策略,遗憾的是$ \ tilde {o}(\ sqrt {t})$,并且在玩家之间完全没有碰撞(具有很高的可能性)。在本文中,我们表明这些属性(几乎是最佳的遗憾和根本没有冲突)对于任何数量的玩家和武器都可以实现。在高水平上,以前的策略在很大程度上依赖于$ 2 $维的几何直觉,在较高的维度上很难概括,而在这里,我们采取了更加组合的路线来构建新策略。

We consider the cooperative multi-player version of the stochastic multi-armed bandit problem. We study the regime where the players cannot communicate but have access to shared randomness. In prior work by the first two authors, a strategy for this regime was constructed for two players and three arms, with regret $\tilde{O}(\sqrt{T})$, and with no collisions at all between the players (with very high probability). In this paper we show that these properties (near-optimal regret and no collisions at all) are achievable for any number of players and arms. At a high level, the previous strategy heavily relied on a $2$-dimensional geometric intuition that was difficult to generalize in higher dimensions, while here we take a more combinatorial route to build the new strategy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源