论文标题

学习上下文感知的任务推理,以进行有效的荟萃方面学习

Learning Context-aware Task Reasoning for Efficient Meta-reinforcement Learning

论文作者

Wang, Haozhe, Zhou, Jiale, He, Xuming

论文摘要

尽管基于深层网络的增强学习(RL)最近取得了成功,但在学习新任务方面达到人类水平的效率仍然难以捉摸。尽管以前的努力试图使用元学习策略来应对这一挑战,但他们通常会使用policy rl算法或通过非政策学习的元算法进行抽样效率低下。在这项工作中,我们提出了一种新型的元素策略来解决这些局限性。特别是,我们将元RL问题分解为三个子任务,即任务探索,任务推导和任务实现,并与两个深网络代理和一个任务编码器实例化。在元训练期间,我们的方法学习了一个任务条件的参与者网络,用于任务实现,这是一个具有自我监督的奖励成型的探险家网络,可以鼓励任务探索的任务信息丰富的体验,以及基于上下文意识到的基于图形的任务编码器来进行任务推断。我们通过对几个公共基准的广泛实验来验证我们的方法,结果表明,我们的算法有效地进行了探索任务推断,提高了训练和测试期间的样本效率,并减轻了元越来越拟合的问题。

Despite recent success of deep network-based Reinforcement Learning (RL), it remains elusive to achieve human-level efficiency in learning novel tasks. While previous efforts attempt to address this challenge using meta-learning strategies, they typically suffer from sampling inefficiency with on-policy RL algorithms or meta-overfitting with off-policy learning. In this work, we propose a novel meta-RL strategy to address those limitations. In particular, we decompose the meta-RL problem into three sub-tasks, task-exploration, task-inference and task-fulfillment, instantiated with two deep network agents and a task encoder. During meta-training, our method learns a task-conditioned actor network for task-fulfillment, an explorer network with a self-supervised reward shaping that encourages task-informative experiences in task-exploration, and a context-aware graph-based task encoder for task inference. We validate our approach with extensive experiments on several public benchmarks and the results show that our algorithm effectively performs exploration for task inference, improves sample efficiency during both training and testing, and mitigates the meta-overfitting problem.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源