重新访问QMIX：坡度熵正则化的歧视性信用分配

论文标题

重新访问QMIX：坡度熵正则化的歧视性信用分配

Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy Regularization

论文作者

Zhao, Jian, Zhang, Yue, Hu, Xunhan, Wang, Weixun, Zhou, Wengang, Hao, Jianye, Zhu, Jiangcheng, Li, Houqiang

论文摘要

在合作的多代理系统中，代理人共同采取行动并获得团队奖励，而不是个人奖励。在没有个人奖励信号的情况下，通常会引入信用分配机制来区分不同代理人的贡献，以实现有效的合作。最近，价值分解范式已被广泛采用以实现信贷分配，QMIX已成为最先进的解决方案。在本文中，我们从两个方面重新访问QMIX。首先，我们提出了关于信用分配量表的新观点，并从经验上表明，QMIX在分配给代理商的信用额方面遭受有限的可判性性。其次，我们建议使用QMIX进行梯度熵正规化，以实现歧视性信贷分配，从而提高了整体绩效。实验表明，我们的方法可以相对提高学习效率并实现更好的表现。

In cooperative multi-agent systems, agents jointly take actions and receive a team reward instead of individual rewards. In the absence of individual reward signals, credit assignment mechanisms are usually introduced to discriminate the contributions of different agents so as to achieve effective cooperation. Recently, the value decomposition paradigm has been widely adopted to realize credit assignment, and QMIX has become the state-of-the-art solution. In this paper, we revisit QMIX from two aspects. First, we propose a new perspective on credit assignment measurement and empirically show that QMIX suffers limited discriminability on the assignment of credits to agents. Second, we propose a gradient entropy regularization with QMIX to realize a discriminative credit assignment, thereby improving the overall performance. The experiments demonstrate that our approach can comparatively improve learning efficiency and achieve better performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题