论文标题
与无重格的玩家进行比赛
Playing against no-regret players
论文作者
论文摘要
在越来越不同的情况下,碰巧的是,人类玩家必须与在决策算法后做出决策的人造玩家进行互动。人类玩家应该如何对抗这些算法来最大化他的实用性?如果他面对一个或多个人工玩家,有什么改变吗?本文的主要目的是回答这两个问题。考虑随着时间的推移重复的正常形式的N玩家游戏,我们将其称为“人类玩家优化器”,以及(n-1)人工玩家,学习者。我们假设学习者播放No-Regret算法,这是一类在线学习和决策中广泛使用的算法。在这些游戏中,我们考虑了Stackelberg平衡的概念。在最近的一篇论文中,Deng,Schneider和Sivan表明,在2播放器游戏中,优化器始终可以保证至少每回合的stackelberg值的预期累积效用。在我们的第一个结果中,我们通过反例表明,如果优化器必须面对多个播放器,则此结果将不再正确。因此,我们概括了Stackelberg平衡的定义,引入了相关的Stackelberg平衡的概念。最后,在主要结果中,我们证明优化器可以至少每回合保证相关的stackelberg值。此外,使用大量强的法律的版本,我们表明我们的结果几乎肯定对于优化器实用程序而不是优化器的预期实用程序。
In increasingly different contexts, it happens that a human player has to interact with artificial players who make decisions following decision-making algorithms. How should the human player play against these algorithms to maximize his utility? Does anything change if he faces one or more artificial players? The main goal of the paper is to answer these two questions. Consider n-player games in normal form repeated over time, where we call the human player optimizer, and the (n -- 1) artificial players, learners. We assume that learners play no-regret algorithms, a class of algorithms widely used in online learning and decision-making. In these games, we consider the concept of Stackelberg equilibrium. In a recent paper, Deng, Schneider, and Sivan have shown that in a 2-player game the optimizer can always guarantee an expected cumulative utility of at least the Stackelberg value per round. In our first result, we show, with counterexamples, that this result is no longer true if the optimizer has to face more than one player. Therefore, we generalize the definition of Stackelberg equilibrium introducing the concept of correlated Stackelberg equilibrium. Finally, in the main result, we prove that the optimizer can guarantee at least the correlated Stackelberg value per round. Moreover, using a version of the strong law of large numbers, we show that our result is also true almost surely for the optimizer utility instead of the optimizer's expected utility.