论文标题
终身政策梯度学习有规定的政策,用于更快的培训,而无需忘记
Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting
论文作者
论文摘要
策略梯度方法在学习控制策略方面已经成功地为高维动力学系统提供了成功。他们最大的缺点是在产生高性能政策之前所需的探索量。在终生学习环境中,代理在其一生中面临多个连续任务,从先前看到的任务中重复使用信息可以大大加快新任务的学习。我们为终身政策梯度学习提供了一种新颖的方法,该方法可以通过政策梯度直接训练终身功能近似值,从而使代理商在整个培训过程中受益于累积知识。我们从经验上表明,我们的算法比单任务和终身学习基准更快地学习并收敛到更好的政策,并且完全避免了灾难性的遗忘在各种具有挑战性的领域上。
Policy gradient methods have shown success in learning control policies for high-dimensional dynamical systems. Their biggest downside is the amount of exploration they require before yielding high-performing policies. In a lifelong learning setting, in which an agent is faced with multiple consecutive tasks over its lifetime, reusing information from previously seen tasks can substantially accelerate the learning of new tasks. We provide a novel method for lifelong policy gradient learning that trains lifelong function approximators directly via policy gradients, allowing the agent to benefit from accumulated knowledge throughout the entire training process. We show empirically that our algorithm learns faster and converges to better policies than single-task and lifelong learning baselines, and completely avoids catastrophic forgetting on a variety of challenging domains.