在多人游戏中学习可合理的平衡

论文标题

在多人游戏中学习可合理的平衡

Learning Rationalizable Equilibria in Multiplayer Games

论文作者

Wang, Yuanhao, Kong, Dingwen, Bai, Yu, Jin, Chi

论文摘要

除了找到均衡外，多种学习的自然目标是学习可合理的行为，在这种行为中，玩家学会避免迭代占主导地位的行动。但是，即使在多人游戏通用游戏的基本设置中，现有算法也需要在匪徒反馈下学习可合理化的平衡的玩家数量中的许多样本。本文开发了一种有效算法的第一线，用于学习可合理的粗相关平衡（CCE）和相关的平衡（CE），其样品复杂性在包括玩家数量在内的所有问题参数中都是多项式的。为了实现这一结果，我们还开发了一种新的有效算法，用于寻找一个可合理的动作概况（不一定是平衡）的更简单任务，其样品复杂性大大改善了Wu等人的最佳现有结果。（2021）。我们的算法结合了几种新型技术，以确保合理性和否（交换）同时遗憾，包括相关的探索方案和自适应学习率，这可能具有独立的兴趣。我们通过样本复杂性下限对结果进行补充，显示了我们保证的清晰度。

A natural goal in multiagent learning besides finding equilibria is to learn rationalizable behavior, where players learn to avoid iteratively dominated actions. However, even in the basic setting of multiplayer general-sum games, existing algorithms require a number of samples exponential in the number of players to learn rationalizable equilibria under bandit feedback. This paper develops the first line of efficient algorithms for learning rationalizable Coarse Correlated Equilibria (CCE) and Correlated Equilibria (CE) whose sample complexities are polynomial in all problem parameters including the number of players. To achieve this result, we also develop a new efficient algorithm for the simpler task of finding one rationalizable action profile (not necessarily an equilibrium), whose sample complexity substantially improves over the best existing results of Wu et al. (2021). Our algorithms incorporate several novel techniques to guarantee rationalizability and no (swap-)regret simultaneously, including a correlated exploration scheme and adaptive learning rates, which may be of independent interest. We complement our results with a sample complexity lower bound showing the sharpness of our guarantees.

下载PDF全文

下载文献需遵守相关版权规定

论文标题