通过自适应政策转移有效的深入强化学习

论文标题

通过自适应政策转移有效的深入强化学习

Efficient Deep Reinforcement Learning via Adaptive Policy Transfer

论文作者

Yang, Tianpei, Hao, Jianye, Meng, Zhaopeng, Zhang, Zongzhang, Hu, Yujing, Cheng, Yingfeng, Fan, Changjie, Wang, Weixun, Liu, Wulong, Wang, Zhaodong, Peng, Jiajie

论文摘要

转移学习（TL）通过利用过去学会的相关任务政策的先验知识来加速加强学习（RL）的巨大潜力。现有的转移方法要么明确计算任务之间的相似性，要么选择适当的源策略以为目标任务提供指导探索。但是，如何通过当前缺少明确衡量相似性的情况下使用适当的源政策中的知识来直接优化目标策略。在本文中，我们提出了一个新颖的政策转移框架（PTF），以利用这一想法来加速RL。我们的框架学习了何时以及哪个源策略是最适合目标策略重复利用的，何时通过将多政策转移作为期权学习问题进行建模来终止它。 PTF可以很容易地与现有的深入RL方法结合在一起。实验结果表明，它在学习效率和离散和连续动作空间中的学习效率和最终绩效方面大大加速了学习过程，并超过了最先进的政策转移方法。

Transfer Learning (TL) has shown great potential to accelerate Reinforcement Learning (RL) by leveraging prior knowledge from past learned policies of relevant tasks. Existing transfer approaches either explicitly computes the similarity between tasks or select appropriate source policies to provide guided explorations for the target task. However, how to directly optimize the target policy by alternatively utilizing knowledge from appropriate source policies without explicitly measuring the similarity is currently missing. In this paper, we propose a novel Policy Transfer Framework (PTF) to accelerate RL by taking advantage of this idea. Our framework learns when and which source policy is the best to reuse for the target policy and when to terminate it by modeling multi-policy transfer as the option learning problem. PTF can be easily combined with existing deep RL approaches. Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art policy transfer methods in terms of learning efficiency and final performance in both discrete and continuous action spaces.

下载PDF全文

下载文献需遵守相关版权规定

论文标题