代理时间的关注稀疏奖励多代理增强学习

论文标题

代理时间的关注稀疏奖励多代理增强学习

Agent-Time Attention for Sparse Rewards Multi-Agent Reinforcement Learning

论文作者

She, Jennifer, Gupta, Jayesh K., Kochenderfer, Mykel J.

论文摘要

稀疏和延迟的奖励对单个代理增强学习构成了挑战。在多机构增强学习（MARL）中，这一挑战被放大，在这些挑战中，这些奖励的信用分配不仅需要在整个时间内，而且还需要在跨代理商中发生。我们提出了代理时间关注（ATA），这是一种神经网络模型，具有辅助损失，用于重新分配稀疏和延迟的奖励。我们提供了一个简单的示例，该示例演示了如何为代理提供自己本地重新分配的奖励，并共享全球重新分配的奖励会激发不同的政策。我们将几个Minigrid环境（特别是多室和门钥匙）扩展到多代理稀疏延迟奖励设置。我们证明，在这些环境的许多情况下，ATA优于各种基准。实验的源代码可在https://github.com/jshe/agent time-prestion上获得。

Sparse and delayed rewards pose a challenge to single agent reinforcement learning. This challenge is amplified in multi-agent reinforcement learning (MARL) where credit assignment of these rewards needs to happen not only across time, but also across agents. We propose Agent-Time Attention (ATA), a neural network model with auxiliary losses for redistributing sparse and delayed rewards in collaborative MARL. We provide a simple example that demonstrates how providing agents with their own local redistributed rewards and shared global redistributed rewards motivate different policies. We extend several MiniGrid environments, specifically MultiRoom and DoorKey, to the multi-agent sparse delayed rewards setting. We demonstrate that ATA outperforms various baselines on many instances of these environments. Source code of the experiments is available at https://github.com/jshe/agent-time-attention.

下载PDF全文

下载文献需遵守相关版权规定

论文标题