论文标题
国家行动维度不匹配下的基于信息的知识转移
Mutual Information Based Knowledge Transfer Under State-Action Dimension Mismatch
论文作者
论文摘要
深度强化学习(RL)算法在各种顺序决策任务上取得了巨大的成功。但是,由于信用调配和高变义梯度等问题,这些算法中的许多算法在从头开始学习时会遭受高样本复杂性。转移学习是将源任务获得的知识应用于更有效地学习不同但相关的目标任务的转移学习,是提高RL样本复杂性的有前途方法。先前的工作已经考虑使用预先培训的教师政策来增强学生政策的学习,尽管这是教师和学生MDP共享州空间或行动空间的限制。在本文中,我们提出了一个新的转移学习框架,其中教师和学生可以任意不同的状态和行动空间。为了处理这种不匹配,我们产生的嵌入方式可以从教师政策和价值网络中系统地提取知识,并将其融合到学生网络中。为了训练嵌入式,我们使用任务一致的损失,并表明可以通过添加相互信息损失来进一步丰富表示形式。使用一组挑战性的模拟机器人运动任务,涉及多腿cent,我们在教师和学生具有不同的州和行动空间时表现出成功的转移学习。
Deep reinforcement learning (RL) algorithms have achieved great success on a wide variety of sequential decision-making tasks. However, many of these algorithms suffer from high sample complexity when learning from scratch using environmental rewards, due to issues such as credit-assignment and high-variance gradients, among others. Transfer learning, in which knowledge gained on a source task is applied to more efficiently learn a different but related target task, is a promising approach to improve the sample complexity in RL. Prior work has considered using pre-trained teacher policies to enhance the learning of the student policy, albeit with the constraint that the teacher and the student MDPs share the state-space or the action-space. In this paper, we propose a new framework for transfer learning where the teacher and the student can have arbitrarily different state- and action-spaces. To handle this mismatch, we produce embeddings which can systematically extract knowledge from the teacher policy and value networks, and blend it into the student networks. To train the embeddings, we use a task-aligned loss and show that the representations could be enriched further by adding a mutual information loss. Using a set of challenging simulated robotic locomotion tasks involving many-legged centipedes, we demonstrate successful transfer learning in situations when the teacher and student have different state- and action-spaces.