论文标题
多人匪徒对对抗性碰撞
Multi-Player Bandits Robust to Adversarial Collisions
论文作者
论文摘要
近年来,由随机多武器多武器匪徒进行了认知无线电的动机。在这种情况下,每个玩家都会拉动手臂,并获得与手臂相对应的奖励,如果没有碰撞,即手臂是由一个单身玩家选择的。否则,如果发生碰撞,玩家将不会获得奖励。在本文中,我们考虑了恶意球员(或攻击者)的存在,他们会通过故意与他们相撞来阻碍合作者(或捍卫者)最大化自己的回报。我们为防守者提供了第一个去中心化且强大的算法Resync,其性能优雅地将其降低为$ \ tilde {o}(c)$,因为攻击者的碰撞数量$ c $增加。我们证明,通过证明下限为$ω(c)$,该算法是最佳的订单最佳选择。该算法对攻击者使用的算法和对攻击者面临的碰撞数量的不可知论是不可知的。
Motivated by cognitive radios, stochastic Multi-Player Multi-Armed Bandits has been extensively studied in recent years. In this setting, each player pulls an arm, and receives a reward corresponding to the arm if there is no collision, namely the arm was selected by one single player. Otherwise, the player receives no reward if collision occurs. In this paper, we consider the presence of malicious players (or attackers) who obstruct the cooperative players (or defenders) from maximizing their rewards, by deliberately colliding with them. We provide the first decentralized and robust algorithm RESYNC for defenders whose performance deteriorates gracefully as $\tilde{O}(C)$ as the number of collisions $C$ from the attackers increases. We show that this algorithm is order-optimal by proving a lower bound which scales as $Ω(C)$. This algorithm is agnostic to the algorithm used by the attackers and agnostic to the number of collisions $C$ faced from attackers.