论文标题
转移增强学习的通用后继功能
Universal Successor Features for Transfer Reinforcement Learning
论文作者
论文摘要
强化学习(RL)的转移是指将从以前的任务中获得的知识应用于解决相关任务的想法。学习通用价值函数(Schaul等,2015),该功能概括了目标和状态,以前已被证明对转移有用。但是,据信继任特征比转移的值更适合(Dayan,1993; Barreto等,2017),尽管它们不能直接概括为新目标。在本文中,我们建议(1)通用后继功能(USF)捕获环境的潜在动态,同时允许概括不见目标,以及(2)可以通过与环境互动来培训的USF的灵活的USF端到端模型。我们表明,学习USFS与使用时间差异方法学习状态值的任何RL算法兼容。我们在一个简单的网格世界中进行的实验,并在两个mujoco环境中表明,在学习多个任务时,USF可以大大加速培训,并可以有效地将知识转移到新任务中。
Transfer in Reinforcement Learning (RL) refers to the idea of applying knowledge gained from previous tasks to solving related tasks. Learning a universal value function (Schaul et al., 2015), which generalizes over goals and states, has previously been shown to be useful for transfer. However, successor features are believed to be more suitable than values for transfer (Dayan, 1993; Barreto et al.,2017), even though they cannot directly generalize to new goals. In this paper, we propose (1) Universal Successor Features (USFs) to capture the underlying dynamics of the environment while allowing generalization to unseen goals and (2) a flexible end-to-end model of USFs that can be trained by interacting with the environment. We show that learning USFs is compatible with any RL algorithm that learns state values using a temporal difference method. Our experiments in a simple gridworld and with two MuJoCo environments show that USFs can greatly accelerate training when learning multiple tasks and can effectively transfer knowledge to new tasks.