多代理增强学习的最大共同信息框架

论文标题

多代理增强学习的最大共同信息框架

A Maximum Mutual Information Framework for Multi-Agent Reinforcement Learning

论文作者

Kim, Woojun, Jung, Whiyoung, Cho, Myungsik, Sung, Youngchul

论文摘要

在本文中，我们提出了一个最大的共同信息（MMI）框架，用于多代理增强学习（MARL），以使多个代理通过使用动作之间的相互信息正规化累积的回报来学习协调的行为。通过引入潜在变量，以在动作之间诱导非零的共同信息并应用变异结合，我们可以在考虑的MMI调制目标函数上获得可拖动的下限。应用策略迭代以最大化派生的下限，我们提出了一种称为变分的最大互信息多代理参与者 - 批评的实用算法，该算法以分散执行（CTDE）进行集中学习。我们评估了几个需要协调的游戏的VM3-AC，数值结果表明，在需要协调的多代理任务中，VM3-AC优于MADDPG和其他MARL算法。

In this paper, we propose a maximum mutual information (MMI) framework for multi-agent reinforcement learning (MARL) to enable multiple agents to learn coordinated behaviors by regularizing the accumulated return with the mutual information between actions. By introducing a latent variable to induce nonzero mutual information between actions and applying a variational bound, we derive a tractable lower bound on the considered MMI-regularized objective function. Applying policy iteration to maximize the derived lower bound, we propose a practical algorithm named variational maximum mutual information multi-agent actor-critic (VM3-AC), which follows centralized learning with decentralized execution (CTDE). We evaluated VM3-AC for several games requiring coordination, and numerical results show that VM3-AC outperforms MADDPG and other MARL algorithms in multi-agent tasks requiring coordination.

下载PDF全文

下载文献需遵守相关版权规定

论文标题