论文标题

无模型的对手塑造

Model-Free Opponent Shaping

论文作者

Lu, Chris, Willi, Timon, de Witt, Christian Schroeder, Foerster, Jakob

论文摘要

在通用游戏中,自我利益的学习代理人的相互作用通常会导致统一的结果,例如迭代囚犯的困境(IPD)中的缺陷缺陷。为了克服这一点,某些方法,例如以对手学习意识(LOLA)学习,塑造了对手的学习过程。但是,这些方法是近视的,因为只能预期少量步骤,因此是不对称的,因为它们将其他代理视为天真的学习者,并且需要使用高阶衍生剂,这些衍生物是通过对对手的可区分学习算法的访问来计算的。为了解决这些问题,我们提出了无模型的对手塑造(M-FOS)。 M-Fos在一个元游戏中学习,每个元步骤是基础内部游戏的一集。元状态由内部策略组成,元政策产生了下一集的新内部政策。然后,M-FOS使用无通用模型的优化方法来学习完成长马对手塑造的元过程。从经验上讲,M-Fos近距离地利用了文献中的幼稚学习者和其他更复杂的算法。例如,据我们所知,这是学习IPD中众所周知的零确定(ZD)勒索策略的第一种方法。在相同的环境中,M-FOS在元自我游戏下导致社会最佳成果。最后,我们表明可以将M-FOS缩放到高维设置。

In general-sum games, the interaction of self-interested learning agents commonly leads to collectively worst-case outcomes, such as defect-defect in the iterated prisoner's dilemma (IPD). To overcome this, some methods, such as Learning with Opponent-Learning Awareness (LOLA), shape their opponents' learning process. However, these methods are myopic since only a small number of steps can be anticipated, are asymmetric since they treat other agents as naive learners, and require the use of higher-order derivatives, which are calculated through white-box access to an opponent's differentiable learning algorithm. To address these issues, we propose Model-Free Opponent Shaping (M-FOS). M-FOS learns in a meta-game in which each meta-step is an episode of the underlying inner game. The meta-state consists of the inner policies, and the meta-policy produces a new inner policy to be used in the next episode. M-FOS then uses generic model-free optimisation methods to learn meta-policies that accomplish long-horizon opponent shaping. Empirically, M-FOS near-optimally exploits naive learners and other, more sophisticated algorithms from the literature. For example, to the best of our knowledge, it is the first method to learn the well-known Zero-Determinant (ZD) extortion strategy in the IPD. In the same settings, M-FOS leads to socially optimal outcomes under meta-self-play. Finally, we show that M-FOS can be scaled to high-dimensional settings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源