在系统设计中利用无重格算法

论文标题

在系统设计中利用无重格算法

Exploiting No-Regret Algorithms in System Design

论文作者

Dinh, Le Cong, Bishop, Nick, Tran-Thanh, Long

论文摘要

我们研究了一个重复的两人零和游戏设置，该游戏设置也是系统的设计师，并对收益矩阵的设计完全控制。此外，ROW播放器使用无重格算法来有效地学习如何随着时间的推移将其策略调整到列播放器的行为中，以实现良好的总收益。专栏播放器的目标是指导对手选择有利于系统设计师的混合策略。因此，她需要：（i）设计适当的回报矩阵$ a $，其独特的minimax解决方案包含了行播放器所需的混合策略；（ii）在一系列戏剧序列中战略性地与Row玩家互动，以指导其对手融合所需的行为。为了设计这样的回报矩阵，我们提出了一种新颖的解决方案，该解决方案可证明具有独特的最小解决方案，并具有所需的行为。我们还调查了不需要唯一性的这个问题的放松，但是所有的最小解决方案对于行播放器都具有相同的混合策略。最后，我们为系统设计师提出了一个新游戏，并证明它可以指导ROW播放器，他们可能会播放\ emph {stable} no-regret算法，以收敛到Minimax解决方案。

We investigate a repeated two-player zero-sum game setting where the column player is also a designer of the system, and has full control on the design of the payoff matrix. In addition, the row player uses a no-regret algorithm to efficiently learn how to adapt their strategy to the column player's behaviour over time in order to achieve good total payoff. The goal of the column player is to guide her opponent to pick a mixed strategy which is favourable for the system designer. Therefore, she needs to: (i) design an appropriate payoff matrix $A$ whose unique minimax solution contains the desired mixed strategy of the row player; and (ii) strategically interact with the row player during a sequence of plays in order to guide her opponent to converge to that desired behaviour. To design such a payoff matrix, we propose a novel solution that provably has a unique minimax solution with the desired behaviour. We also investigate a relaxation of this problem where uniqueness is not required, but all the minimax solutions have the same mixed strategy for the row player. Finally, we propose a new game playing algorithm for the system designer and prove that it can guide the row player, who may play a \emph{stable} no-regret algorithm, to converge to a minimax solution.

下载PDF全文

下载文献需遵守相关版权规定

论文标题