价值分解多代理演员审判

论文标题

价值分解多代理演员审判

Value-Decomposition Multi-Agent Actor-Critics

论文作者

Su, Jianyu, Adams, Stephen, Beling, Peter A.

论文摘要

对额外状态信息的开发一直是多代理增强学习（MARL）的活跃研究领域。 QMIX代表使用非负函数近似器的联合动作值，并在迄今为止在多代理基准，Starcraft II微管理任务上实现最佳性能。但是，我们的实验表明，在某些情况下，QMIX与A2C不相容，A2C是促进算法训练效率的训练范式。为了在训练效率和算法性能之间进行合理的权衡，我们将价值分解扩展到与A2C兼容的参与者批评者，并提出了一种新颖的参与者批评框架，价值分解的参与者侵犯（VDACS）。我们在Starcraft II微管理任务的测试床上评估VDAC，并证明所提出的框架比其他参与者 - 批评方法改善了中位数性能。此外，我们使用一组消融实验来识别有助于VDAC绩效的关键因素。

The exploitation of extra state information has been an active research area in multi-agent reinforcement learning (MARL). QMIX represents the joint action-value using a non-negative function approximator and achieves the best performance, by far, on multi-agent benchmarks, StarCraft II micromanagement tasks. However, our experiments show that, in some cases, QMIX is incompatible with A2C, a training paradigm that promotes algorithm training efficiency. To obtain a reasonable trade-off between training efficiency and algorithm performance, we extend value-decomposition to actor-critics that are compatible with A2C and propose a novel actor-critic framework, value-decomposition actor-critics (VDACs). We evaluate VDACs on the testbed of StarCraft II micromanagement tasks and demonstrate that the proposed framework improves median performance over other actor-critic methods. Furthermore, we use a set of ablation experiments to identify the key factors that contribute to the performance of VDACs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题