一种政策共振方法，以解决多种强化学习中责任扩散问题

论文标题

一种政策共振方法，以解决多种强化学习中责任扩散问题

A Policy Resonance Approach to Solve the Problem of Responsibility Diffusion in Multiagent Reinforcement Learning

论文作者

Fu, Qingxu, Qiu, Tenghai, Yi, Jianqiang, Pu, Zhiqiang, Ai, Xiaolin, Yuan, Wanmai

论文摘要

SOTA多力增强算法以多种方式与单一质量等价区分开来。但是，他们中的大多数仍然完全继承了单一探索策略。天真地从单药算法中继承了这一策略会导致潜在的协作失败，其中代理商盲目地遵循主流行为并拒绝承担少数责任。我们将此问题命名为责任扩散（RD），因为它具有具有相同名称的社会心理学效果的相似之处。在这项工作中，我们从理论上分析了该RD问题的原因开始，可以追溯到多构型系统（尤其是大型多构想系统）的勘探 - 开发困境。我们通过提出一种政策共振方法（PR）方法来解决这个问题，该方法通过重新制定了联合代理政策，同时使单个政策几乎不变，从而修改了代理的协作探索策略。接下来，我们表明SOTA算法可以装备这种方法来促进代理在复杂的合作任务中的协作性能。实验是在多个测试基准任务中进行的，以说明这种方法的有效性。

SOTA multiagent reinforcement algorithms distinguish themselves in many ways from their single-agent equivalences. However, most of them still totally inherit the single-agent exploration-exploitation strategy. Naively inheriting this strategy from single-agent algorithms causes potential collaboration failures, in which the agents blindly follow mainstream behaviors and reject taking minority responsibility. We name this problem the Responsibility Diffusion (RD) as it shares similarities with a same-name social psychology effect. In this work, we start by theoretically analyzing the cause of this RD problem, which can be traced back to the exploration-exploitation dilemma of multiagent systems (especially large-scale multiagent systems). We address this RD problem by proposing a Policy Resonance (PR) approach which modifies the collaborative exploration strategy of agents by refactoring the joint agent policy while keeping individual policies approximately invariant. Next, we show that SOTA algorithms can equip this approach to promote the collaborative performance of agents in complex cooperative tasks. Experiments are performed in multiple test benchmark tasks to illustrate the effectiveness of this approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题