论文标题

在游戏中进行对冲:外部和掉期后悔的更快融合

Hedging in games: Faster convergence of external and swap regrets

论文作者

Chen, Xi, Peng, Binghui

论文摘要

我们考虑玩家运行篱笆算法或其乐观变体以反复玩$ n $ thums的设置。 1)对于两人游戏,我们表明,乐观的树篱的遗憾在$ \ tilde {o}(1/t ^{5/6})$中,改善了先前的绑定$ O(1/t ^{3/4})$,由Syrgkanis,Agarwal,Luo和Schapire(Luo and Schapire(Nips'15))。 2)相反,我们表明,香草对冲的收敛速率不超过$ \tildeΩ(1/ \ sqrt {t})$,解决了在Syrgkanis,Agarwal,Luo,Luo和Schapire(NIPS'15)中发布的一个开放问题。 对于一般的M-玩家游戏,我们表明每个玩家以速率$ \ tilde {o}(M^{1/2}(N/T)^{3/4})$衰减的互换后悔,当他们将乐观的对冲与经典外部对蓝色和曼苏尔的经典外部减少(JMLR'07)相结合时。还可以修改该算法以实现相同的速率,并且$ \ tilde {o}(\ sqrt {n/t})$对对抗者的速率。通过标准连接,我们的上限还意味着在两人游戏中,与粗相关的平衡速度更快,并且在多人游戏中相关联的平衡。

We consider the setting where players run the Hedge algorithm or its optimistic variant to play an $n$-action game repeatedly for $T$ rounds. 1) For two-player games, we show that the regret of optimistic Hedge decays at $\tilde{O}( 1/T ^{5/6} )$, improving the previous bound $O(1/T^{3/4})$ by Syrgkanis, Agarwal, Luo and Schapire (NIPS'15) 2) In contrast, we show that the convergence rate of vanilla Hedge is no better than $\tildeΩ(1/ \sqrt{T})$, addressing an open question posted in Syrgkanis, Agarwal, Luo and Schapire (NIPS'15). For general m-player games, we show that the swap regret of each player decays at rate $\tilde{O}(m^{1/2} (n/T)^{3/4})$ when they combine optimistic Hedge with the classical external-to-internal reduction of Blum and Mansour (JMLR'07). The algorithm can also be modified to achieve the same rate against itself and a rate of $\tilde{O}(\sqrt{n/T})$ against adversaries. Via standard connections, our upper bounds also imply faster convergence to coarse correlated equilibria in two-player games and to correlated equilibria in multiplayer games.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源