最佳的前Ante协调合格策略的更快算法在广泛形式的零和游戏中

论文标题

最佳的前Ante协调合格策略的更快算法在广泛形式的零和游戏中

Faster Algorithms for Optimal Ex-Ante Coordinated Collusive Strategies in Extensive-Form Zero-Sum Games

论文作者

Farina, Gabriele, Celli, Andrea, Gatti, Nicola, Sandholm, Tuomas

论文摘要

我们关注的问题是，在不完美的零和大型游戏中，面对对手的两个球员找到最佳策略。球队成员在比赛期间不得进行交流，但可以在比赛前进行协调。在这种情况下，众所周知，团队可以做的最好的是从游戏开始时从联合（又称相关）的概率分布中对潜在的随机策略（每次玩家）进行了简介。在本文中，我们首先通过与有关广泛形式相关性的不同文献的联系来提供有关计算这种最佳分布的新建模结果。其次，我们提供了一种算法，该算法仅使用只有一个团队成员可以在每个配置文件中随机化的配置文件来计算这样的最佳分布。我们还可以限制解决方案中允许的此类配置文件的数量。这通过增加盖子来随时随地算法。我们发现，通常会有少数精心挑选的此类配置文件足以达到团队的最佳实用性。这使团队成员能够通过一个相对简单且易于理解的计划达成协调。最后，受到我们引入的理论概念的这种观察的启发，我们开发了一种有效的色谱柱生成算法，用于为团队找到最佳分布。我们在一套常见的基准游戏中对其进行了评估。这比后者可以解决的游戏中的先前最新状态要快三个数量级，它也可以解决以前无法解决的几款游戏。

We focus on the problem of finding an optimal strategy for a team of two players that faces an opponent in an imperfect-information zero-sum extensive-form game. Team members are not allowed to communicate during play but can coordinate before the game. In that setting, it is known that the best the team can do is sample a profile of potentially randomized strategies (one per player) from a joint (a.k.a. correlated) probability distribution at the beginning of the game. In this paper, we first provide new modeling results about computing such an optimal distribution by drawing a connection to a different literature on extensive-form correlation. Second, we provide an algorithm that computes such an optimal distribution by only using profiles where only one of the team members gets to randomize in each profile. We can also cap the number of such profiles we allow in the solution. This begets an anytime algorithm by increasing the cap. We find that often a handful of well-chosen such profiles suffices to reach optimal utility for the team. This enables team members to reach coordination through a relatively simple and understandable plan. Finally, inspired by this observation and leveraging theoretical concepts that we introduce, we develop an efficient column-generation algorithm for finding an optimal distribution for the team. We evaluate it on a suite of common benchmark games. It is three orders of magnitude faster than the prior state of the art on games that the latter can solve and it can also solve several games that were previously unsolvable.

下载PDF全文

下载文献需遵守相关版权规定

论文标题