论文标题

价值分解多代理演员审判

Value-Decomposition Multi-Agent Actor-Critics

论文作者

Su, Jianyu, Adams, Stephen, Beling, Peter A.

论文摘要

对额外状态信息的开发一直是多代理增强学习(MARL)的活跃研究领域。 QMIX代表使用非负函数近似器的联合动作值,并在迄今为止在多代理基准,Starcraft II微管理任务上实现最佳性能。但是,我们的实验表明,在某些情况下,QMIX与A2C不相容,A2C是促进算法训练效率的训练范式。为了在训练效率和算法性能之间进行合理的权衡,我们将价值分解扩展到与A2C兼容的参与者批评者,并提出了一种新颖的参与者批评框架,价值分解的参与者侵犯(VDACS)。我们在Starcraft II微管理任务的测试床上评估VDAC,并证明所提出的框架比其他参与者 - 批评方法改善了中位数性能。此外,我们使用一组消融实验来识别有助于VDAC绩效的关键因素。

The exploitation of extra state information has been an active research area in multi-agent reinforcement learning (MARL). QMIX represents the joint action-value using a non-negative function approximator and achieves the best performance, by far, on multi-agent benchmarks, StarCraft II micromanagement tasks. However, our experiments show that, in some cases, QMIX is incompatible with A2C, a training paradigm that promotes algorithm training efficiency. To obtain a reasonable trade-off between training efficiency and algorithm performance, we extend value-decomposition to actor-critics that are compatible with A2C and propose a novel actor-critic framework, value-decomposition actor-critics (VDACs). We evaluate VDACs on the testbed of StarCraft II micromanagement tasks and demonstrate that the proposed framework improves median performance over other actor-critic methods. Furthermore, we use a set of ablation experiments to identify the key factors that contribute to the performance of VDACs.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源