论文标题
对手在多目标正常形式游戏中学习意识和建模
Opponent Learning Awareness and Modelling in Multi-Objective Normal Form Games
论文作者
论文摘要
许多现实世界的多代理相互作用都考虑了多个不同的标准,即收益本质上是多目标的。但是,相同的多目标回报向量可能会导致每个参与者的公用事业不同。因此,对于代理商来说,了解系统中其他代理的行为至关重要。在这项工作中,我们介绍了对这种对手建模对多目标多代理与非线性实用程序的多代理相互作用的影响的首次研究。具体而言,我们考虑在标量预期回报优化标准下具有非线性实用程序功能的两人多目标正常表格。我们贡献了新颖的参与者评价和政策梯度的表述,以允许在这种情况下加强对混合策略的学习,以及将对手政策重建和学习与对手学习意识结合在一起的扩展(即,在预期对手的学习步骤时考虑了一个人的政策影响)。五个不同的MONFG中的经验结果表明,对手学习意识和建模可以大大改变这种情况下的学习动力。当存在平衡时,对手建模可以为实施它的代理带来重大利益。当没有纳什均衡状态时,对手学习意识和建模会使代理仍然可以融合到近似于平衡的有意义的解决方案。
Many real-world multi-agent interactions consider multiple distinct criteria, i.e. the payoffs are multi-objective in nature. However, the same multi-objective payoff vector may lead to different utilities for each participant. Therefore, it is essential for an agent to learn about the behaviour of other agents in the system. In this work, we present the first study of the effects of such opponent modelling on multi-objective multi-agent interactions with non-linear utilities. Specifically, we consider two-player multi-objective normal form games with non-linear utility functions under the scalarised expected returns optimisation criterion. We contribute novel actor-critic and policy gradient formulations to allow reinforcement learning of mixed strategies in this setting, along with extensions that incorporate opponent policy reconstruction and learning with opponent learning awareness (i.e., learning while considering the impact of one's policy when anticipating the opponent's learning step). Empirical results in five different MONFGs demonstrate that opponent learning awareness and modelling can drastically alter the learning dynamics in this setting. When equilibria are present, opponent modelling can confer significant benefits on agents that implement it. When there are no Nash equilibria, opponent learning awareness and modelling allows agents to still converge to meaningful solutions that approximate equilibria.