PACO：参数组合多任务增强学习

论文标题

PACO：参数组合多任务增强学习

PaCo: Parameter-Compositional Multi-Task Reinforcement Learning

论文作者

Sun, Lingfeng, Zhang, Haichao, Xu, Wei, Tomizuka, Masayoshi

论文摘要

多任务增强学习（MTRL）的目的是培训可以应用于一组不同任务的单个策略。共享参数使我们可以利用任务之间的相似性。但是，内容和不同任务困难之间的差距为我们带来了两者的挑战，这些任务应共享参数以及应共享哪些参数，以及由于参数共享而引起的优化挑战。在这项工作中，我们引入了一种参数复合方法（PACO），以解决这些挑战。在此框架中，学习了以一组参数为代表的策略子空间。所有单个任务的策略都在此子空间中，可以通过与学习的集合插值来组成。它不仅允许灵活的参数共享，还允许一种自然的方法来改善培训。我们证明了元世界基准的最新性能，验证了拟议方法的有效性。

The purpose of multi-task reinforcement learning (MTRL) is to train a single policy that can be applied to a set of different tasks. Sharing parameters allows us to take advantage of the similarities among tasks. However, the gaps between contents and difficulties of different tasks bring us challenges on both which tasks should share the parameters and what parameters should be shared, as well as the optimization challenges due to parameter sharing. In this work, we introduce a parameter-compositional approach (PaCo) as an attempt to address these challenges. In this framework, a policy subspace represented by a set of parameters is learned. Policies for all the single tasks lie in this subspace and can be composed by interpolating with the learned set. It allows not only flexible parameter sharing but also a natural way to improve training. We demonstrate the state-of-the-art performance on Meta-World benchmarks, verifying the effectiveness of the proposed approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题