杜尔诺特游戏中的多代理强化学习

论文标题

杜尔诺特游戏中的多代理强化学习

Multi-Agent Reinforcement Learning in Cournot Games

论文作者

Shi, Yuanyuan, Zhang, Baosen

论文摘要

在这项工作中，我们研究了连续行动库诺游戏中战略代理商与信息反馈有限的相互作用。 Cournot Game是许多社会经济系统的重要市场模型，代理商在没有系统或彼此了解的情况下学习和竞争。我们在凹入的Cournot游戏中考虑了策略梯度算法的动态，该算法是一种广泛采用的连续控制强化学习算法。当价格函数是线性或代理数为两个时，我们证明了策略梯度动力学与NASH平衡的收敛性。这是对学习算法与连续的动作空间的收敛性属性的第一个结果（据我们所知），这些算法不属于No-Regret类。

In this work, we study the interaction of strategic agents in continuous action Cournot games with limited information feedback. Cournot game is the essential market model for many socio-economic systems where agents learn and compete without the full knowledge of the system or each other. We consider the dynamics of the policy gradient algorithm, which is a widely adopted continuous control reinforcement learning algorithm, in concave Cournot games. We prove the convergence of policy gradient dynamics to the Nash equilibrium when the price function is linear or the number of agents is two. This is the first result (to the best of our knowledge) on the convergence property of learning algorithms with continuous action spaces that do not fall in the no-regret class.

下载PDF全文

下载文献需遵守相关版权规定

论文标题