在游戏中进行对冲：外部和掉期后悔的更快融合

论文标题

在游戏中进行对冲：外部和掉期后悔的更快融合

Hedging in games: Faster convergence of external and swap regrets

论文作者

Chen, Xi, Peng, Binghui

论文摘要

我们考虑玩家运行篱笆算法或其乐观变体以反复玩$ n $ thums的设置。 1）对于两人游戏，我们表明，乐观的树篱的遗憾在$ \ tilde {o}（1/t ^{5/6}）$中，改善了先前的绑定$ O（1/t ^{3/4}）$，由Syrgkanis，Agarwal，Luo和Schapire（Luo and Schapire（Nips'15））。 2）相反，我们表明，香草对冲的收敛速率不超过$ \tildeΩ（1/ \ sqrt {t}）$，解决了在Syrgkanis，Agarwal，Luo，Luo和Schapire（NIPS'15）中发布的一个开放问题。对于一般的M-玩家游戏，我们表明每个玩家以速率$ \ tilde {o}（M^{1/2}（N/T）^{3/4}）$衰减的互换后悔，当他们将乐观的对冲与经典外部对蓝色和曼苏尔的经典外部减少（JMLR'07）相结合时。还可以修改该算法以实现相同的速率，并且$ \ tilde {o}（\ sqrt {n/t}）$对对抗者的速率。通过标准连接，我们的上限还意味着在两人游戏中，与粗相关的平衡速度更快，并且在多人游戏中相关联的平衡。

We consider the setting where players run the Hedge algorithm or its optimistic variant to play an $n$-action game repeatedly for $T$ rounds. 1) For two-player games, we show that the regret of optimistic Hedge decays at $\tilde{O}( 1/T ^{5/6} )$, improving the previous bound $O(1/T^{3/4})$ by Syrgkanis, Agarwal, Luo and Schapire (NIPS'15) 2) In contrast, we show that the convergence rate of vanilla Hedge is no better than $\tildeΩ(1/ \sqrt{T})$, addressing an open question posted in Syrgkanis, Agarwal, Luo and Schapire (NIPS'15). For general m-player games, we show that the swap regret of each player decays at rate $\tilde{O}(m^{1/2} (n/T)^{3/4})$ when they combine optimistic Hedge with the classical external-to-internal reduction of Blum and Mansour (JMLR'07). The algorithm can also be modified to achieve the same rate against itself and a rate of $\tilde{O}(\sqrt{n/T})$ against adversaries. Via standard connections, our upper bounds also imply faster convergence to coarse correlated equilibria in two-player games and to correlated equilibria in multiplayer games.

下载PDF全文

下载文献需遵守相关版权规定

论文标题