多项式时间最佳平衡与大型游戏中的调解人

论文标题

多项式时间最佳平衡与大型游戏中的调解人

Polynomial-Time Optimal Equilibria with a Mediator in Extensive-Form Games

论文作者

Zhang, Brian Hu, Sandholm, Tuomas

论文摘要

对于广泛形式游戏中相关平衡的常见概念，计算最佳（例如，福利 - 最大化）平衡的概念是NP-HARD。其他均衡概念 - 通信（Forges 1986）和认证（Forges＆Koessler 2005）Equilibria-通过调解员有能力向玩家发送和接收消息 - 尤其是记住消息。在本文中，我们研究了来自计算镜头的广泛游戏中的两个概念。我们表明，两个概念中的最佳平衡都可以在多项式时间内计算，后者是文献中已知的自然额外假设。我们的证明作品是通过构建一个多项式大小的调解员游戏，该游戏明确代表了调解员的决策和行动。我们的框架使我们能够通过改变调解员的信息分区，玩家的撒谎能力以及玩家的偏离能力来定义整个平衡家庭。从这个角度来看，我们表明其他平衡概念（例如广泛形式相关的平衡）对应于召回不完美的介体。这表明，至少在所有这些均衡概念中，计算的硬度是由调解人的不完善召回驱动的。作为我们一般结构的特殊情况，我们恢复了1）Conitzer＆Sandholm（2004）的多项式时间算法（2004年），用于贝叶斯 - 纳什平衡中的自动化机制设计和2）2）Zhang等人（2022）的相关性DAG算法（2022），以实现最佳相关性。当我们将平衡概念定义为全面认证平衡时，我们的算法特别可扩展，在这里，玩家不能对自己的信息撒谎，但他们可以保持沉默。我们通过在一系列标准基准游戏中进行实验来备份理论主张。

For common notions of correlated equilibrium in extensive-form games, computing an optimal (e.g., welfare-maximizing) equilibrium is NP-hard. Other equilibrium notions -- communication (Forges 1986) and certification (Forges & Koessler 2005) equilibria -- augment the game with a mediator that has the power to both send and receive messages to and from the players -- and, in particular, to remember the messages. In this paper, we investigate both notions in extensive-form games from a computational lens. We show that optimal equilibria in both notions can be computed in polynomial time, the latter under a natural additional assumption known in the literature. Our proof works by constructing a mediator-augmented game of polynomial size that explicitly represents the mediator's decisions and actions. Our framework allows us to define an entire family of equilibria by varying the mediator's information partition, the players' ability to lie, and the players' ability to deviate. From this perspective, we show that other notions of equilibrium, such as extensive-form correlated equilibrium, correspond to the mediator having imperfect recall. This shows that, at least among all these equilibrium notions, the hardness of computation is driven by the mediator's imperfect recall. As special cases of our general construction, we recover 1) the polynomial-time algorithm of Conitzer & Sandholm (2004) for automated mechanism design in Bayes-Nash equilibria and 2) the correlation DAG algorithm of Zhang et al (2022) for optimal correlation. Our algorithm is especially scalable when the equilibrium notion is what we define as the full-certification equilibrium, where players cannot lie about their information but they can be silent. We back up our theoretical claims with experiments on a suite of standard benchmark games.

下载PDF全文

下载文献需遵守相关版权规定

论文标题