在两人零和零游戏中，最后且嘈杂的反馈持续趋同的融合

论文标题

在两人零和零游戏中，最后且嘈杂的反馈持续趋同的融合

Last-Iterate Convergence with Full and Noisy Feedback in Two-Player Zero-Sum Games

论文作者

Abe, Kenshi, Ariu, Kaito, Sakamoto, Mitsuki, Toyoshima, Kentaro, Iwasaki, Atsushi

论文摘要

本文提出了以突变驱动的乘法权重更新（M2WU），用于在两人零和零正常形式游戏中学习平衡，并证明它在完整和嘈杂的反馈设置中都表现出了最后的题融合属性。在前者中，玩家观察到实用程序功能的确切梯度向量。在后者中，他们只观察到嘈杂的梯度向量。即使是著名的乘法更新（MWU）和乐观的MWU（OMWU）算法也可能不会收敛到具有嘈杂反馈的NASH平衡。相反，在两个反馈设置中，M2WU在NASH平衡附近表现出最后的近期收敛。然后，我们通过迭代适应突变项来证明它会收敛到精确的NASH平衡。我们从经验上确认，M2WU在可剥削性和收敛速度方面胜过MWU和OMWU。

This paper proposes Mutation-Driven Multiplicative Weights Update (M2WU) for learning an equilibrium in two-player zero-sum normal-form games and proves that it exhibits the last-iterate convergence property in both full and noisy feedback settings. In the former, players observe their exact gradient vectors of the utility functions. In the latter, they only observe the noisy gradient vectors. Even the celebrated Multiplicative Weights Update (MWU) and Optimistic MWU (OMWU) algorithms may not converge to a Nash equilibrium with noisy feedback. On the contrary, M2WU exhibits the last-iterate convergence to a stationary point near a Nash equilibrium in both feedback settings. We then prove that it converges to an exact Nash equilibrium by iteratively adapting the mutation term. We empirically confirm that M2WU outperforms MWU and OMWU in exploitability and convergence rates.

下载PDF全文

下载文献需遵守相关版权规定

论文标题