迈向荟萃方面学习的有效背景：一种基于对比度学习的方法

论文标题

迈向荟萃方面学习的有效背景：一种基于对比度学习的方法

Towards Effective Context for Meta-Reinforcement Learning: an Approach based on Contrastive Learning

论文作者

Fu, Haotian, Tang, Hongyao, Hao, Jianye, Chen, Chen, Feng, Xidong, Li, Dong, Liu, Wulong

论文摘要

上下文是以前收集的轨迹的嵌入，是一种用于元强化学习（META-RL）算法的强大构造。通过在有效的环境下进行条件，元RL策略可以轻松地在几个适应步骤中推广到新任务。我们认为，提高上下文的质量涉及回答两个问题：1。如何训练一个可以嵌入先前轨迹中包含的特定任务信息的紧凑和足够的编码器？ 2。如何收集相应上下文的信息轨迹反映了任务的规范？为此，我们提出了一个称为CCM的新型元RL框架（对比度学习增强基于上下文的元rl）。我们首先关注不同任务背后的对比性质，并利用它来训练紧凑而充分的上下文编码器。此外，我们训练了一个单独的探索政策，理论上得出了一个新的基于信息的目标，该目标旨在以几个步骤收集信息轨迹。从经验上讲，我们评估了我们在常见基准和几种复杂的稀疏奖励环境上的方法。实验结果表明，CCM分别解决了前面提到的问题，优于最先进的算法。

Context, the embedding of previous collected trajectories, is a powerful construct for Meta-Reinforcement Learning (Meta-RL) algorithms. By conditioning on an effective context, Meta-RL policies can easily generalize to new tasks within a few adaptation steps. We argue that improving the quality of context involves answering two questions: 1. How to train a compact and sufficient encoder that can embed the task-specific information contained in prior trajectories? 2. How to collect informative trajectories of which the corresponding context reflects the specification of tasks? To this end, we propose a novel Meta-RL framework called CCM (Contrastive learning augmented Context-based Meta-RL). We first focus on the contrastive nature behind different tasks and leverage it to train a compact and sufficient context encoder. Further, we train a separate exploration policy and theoretically derive a new information-gain-based objective which aims to collect informative trajectories in a few steps. Empirically, we evaluate our approaches on common benchmarks as well as several complex sparse-reward environments. The experimental results show that CCM outperforms state-of-the-art algorithms by addressing previously mentioned problems respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题