论文标题

关于竞争性多代理强化学习中信息不对称性:融合和最佳性

On Information Asymmetry in Competitive Multi-Agent Reinforcement Learning: Convergence and Optimality

论文作者

Tampubolon, Ezra, Ceribasic, Haris, Boche, Holger

论文摘要

在这项工作中,我们研究了与非合作性两种Q学习代理相互作用的系统,其中一种代理具有观察对方的行为的特权。我们表明,这些信息不对称会导致人口学习的稳定结果,这通常不会在一般独立学习者的环境中发生。在基本的游戏意义上,即产生的学习后政策几乎是最佳的,即它们形成纳什均衡。此外,我们在这项工作中提出了一种Q学习算法,需要对后来的两个对手的行动进行预测观察,从而产生了最佳策略,因为后者采用了固定策略,并讨论了基础信息不对称的游戏中NASH平衡的存在。

In this work, we study the system of interacting non-cooperative two Q-learning agents, where one agent has the privilege of observing the other's actions. We show that this information asymmetry can lead to a stable outcome of population learning, which generally does not occur in an environment of general independent learners. The resulting post-learning policies are almost optimal in the underlying game sense, i.e., they form a Nash equilibrium. Furthermore, we propose in this work a Q-learning algorithm, requiring predictive observation of two subsequent opponent's actions, yielding an optimal strategy given that the latter applies a stationary strategy, and discuss the existence of the Nash equilibrium in the underlying information asymmetrical game.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源