通过多种联盟培训学习异质代理合作

论文标题

通过多种联盟培训学习异质代理合作

Learning Heterogeneous Agent Cooperation via Multiagent League Training

论文作者

Fu, Qingxu, Ai, Xiaolin, Yi, Jianqiang, Qiu, Tenghai, Yuan, Wanmai, Pu, Zhiqiang

论文摘要

现实世界中的许多多基因系统都包含多种具有不同能力和功能的代理。这种异质的多基因系统具有显着的实际优势。但是，与用于多种强化学习的同质系统相比，它们还面临着挑战，例如非平稳问题和政策版本迭代问题。这项工作提出了一种通用加强学习算法，名为异质联盟培训（HLT），以解决异质的多种问题。 HLT跟踪代理商在培训期间探索的一系列政策，收集了异质政策联盟，以促进未来的政策优化。此外，在与具有不同水平的合作技能的队友合作时，引入了超网络，以增加代理行为的多样性。我们使用异质基准任务来证明（1）HLT促进了合作异质任务的成功率；（2）HLT是解决政策版本迭代问题的有效方法；（3）HLT提供了一种评估在异质团队中学习每个角色的难度的实用方法。

Many multiagent systems in the real world include multiple types of agents with different abilities and functionality. Such heterogeneous multiagent systems have significant practical advantages. However, they also come with challenges compared with homogeneous systems for multiagent reinforcement learning, such as the non-stationary problem and the policy version iteration issue. This work proposes a general-purpose reinforcement learning algorithm named Heterogeneous League Training (HLT) to address heterogeneous multiagent problems. HLT keeps track of a pool of policies that agents have explored during training, gathering a league of heterogeneous policies to facilitate future policy optimization. Moreover, a hyper-network is introduced to increase the diversity of agent behaviors when collaborating with teammates having different levels of cooperation skills. We use heterogeneous benchmark tasks to demonstrate that (1) HLT promotes the success rate in cooperative heterogeneous tasks; (2) HLT is an effective approach to solving the policy version iteration problem; (3) HLT provides a practical way to assess the difficulty of learning each role in a heterogeneous team.

下载PDF全文

下载文献需遵守相关版权规定

论文标题