在对等队友中转移异质知识：一种模型蒸馏方法

论文标题

在对等队友中转移异质知识：一种模型蒸馏方法

Transfer Heterogeneous Knowledge Among Peer-to-Peer Teammates: A Model Distillation Approach

论文作者

Xue, Zeyue, Luo, Shuang, Wu, Chao, Zhou, Pan, Bian, Kaigui, Du, Wei

论文摘要

分布式环境中的点对点知识转移已成为一种有前途的方法，因为它可以加速学习并提高团队范围的表现，而无需依靠深度强化学习的预先培训的教师。但是，对于传统的点对点方法，例如采取建议，他们在如何有效地表达知识和建议方面遇到了困难。结果，我们提出了一个全新的解决方案，以通过模型蒸馏来重新使用经验和转移价值功能。但是，直接转移Q功能仍然是具有挑战性的，因为它不稳定且不有限。 To address this issue confronted with existing works, we adopt Categorical Deep Q-Network.我们还描述了如何设计有效的通信协议来利用多种分布式药物之间的异质知识。我们提出的框架，即学习和教学分类加强（LTCR），在四个典型的实验环境中，在稳定和加速学习进度方面表现出了有希望的表现，并提高了团队范围的奖励。

Peer-to-peer knowledge transfer in distributed environments has emerged as a promising method since it could accelerate learning and improve team-wide performance without relying on pre-trained teachers in deep reinforcement learning. However, for traditional peer-to-peer methods such as action advising, they have encountered difficulties in how to efficiently expressed knowledge and advice. As a result, we propose a brand new solution to reuse experiences and transfer value functions among multiple students via model distillation. But it is still challenging to transfer Q-function directly since it is unstable and not bounded. To address this issue confronted with existing works, we adopt Categorical Deep Q-Network. We also describe how to design an efficient communication protocol to exploit heterogeneous knowledge among multiple distributed agents. Our proposed framework, namely Learning and Teaching Categorical Reinforcement (LTCR), shows promising performance on stabilizing and accelerating learning progress with improved team-wide reward in four typical experimental environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题