remax：多代理探索的关系表示

论文标题

remax：多代理探索的关系表示

REMAX: Relational Representation for Multi-Agent Exploration

论文作者

Ryu, Heechang, Shin, Hayong, Park, Jinkyoo

论文摘要

通常很难培训具有稀疏奖励的多代理增强学习模型（MARL）模型，因为代理之间的许多相互作用组合会引起一定的结果（即成功或失败）。较早的研究试图通过采用内在奖励来引起互动来解决这个问题，这有助于学习有效的政策。但是，这种方法需要广泛的先验知识来设计内在的奖励。为了在不设计内在奖励的情况下有效地训练MARL模型，我们提出了一种基于学习的探索策略来生成游戏的初始状态。所提出的方法采用变分图自动编码器来表示游戏状态，以便（1）可以通过考虑代理之间的关系将状态紧凑到潜在表示中，并且（2）潜在表示可以用作耦合的替代模型的有效输入来预测探索得分。然后，提出的方法找到了新的潜在表示，以最大程度地提高勘探分数并解码这些表示形式，以生成MARL模型在游戏中开始训练的初始状态，从而体验新颖且可奖励的状态。我们证明，我们的方法比现有的探索方法更能改善MARL模型的训练和性能。

Training a multi-agent reinforcement learning (MARL) model with a sparse reward is generally difficult because numerous combinations of interactions among agents induce a certain outcome (i.e., success or failure). Earlier studies have tried to resolve this issue by employing an intrinsic reward to induce interactions that are helpful for learning an effective policy. However, this approach requires extensive prior knowledge for designing an intrinsic reward. To train the MARL model effectively without designing the intrinsic reward, we propose a learning-based exploration strategy to generate the initial states of a game. The proposed method adopts a variational graph autoencoder to represent a game state such that (1) the state can be compactly encoded to a latent representation by considering relationships among agents, and (2) the latent representation can be used as an effective input for a coupled surrogate model to predict an exploration score. The proposed method then finds new latent representations that maximize the exploration scores and decodes these representations to generate initial states from which the MARL model starts training in the game and thus experiences novel and rewardable states. We demonstrate that our method improves the training and performance of the MARL model more than the existing exploration methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题