论文标题
CCLF:针对样品有效增强学习的对比度持续性学习框架
CCLF: A Contrastive-Curiosity-Driven Learning Framework for Sample-Efficient Reinforcement Learning
论文作者
论文摘要
在强化学习(RL)中,直接从高维观察中学习是一项挑战,在该观察结果中,最近已证明数据增强可以通过编码RAW PIXELS的不向导来解决此问题。然而,我们从经验上发现,并非所有样本都同样重要,因此仅仅注入更多增强的输入可能会导致Q学习的不稳定。在本文中,我们通过开发模型不合时宜的对比度驱动的学习框架(CCLF)来系统地解决此问题,该框架可以完全利用样本重要性并以自我监督的方式提高学习效率。 CCLF促进了提出的对比度的好奇心,能够优先考虑经验重播,选择最有用的增强输入,更重要的是将Q功能的正规化以及编码器正规化,以便将更多的数据集中在学习范围内的数据上。此外,它鼓励代理商以基于好奇心的奖励进行探索。结果,代理可以专注于更有信息的样本,并以显着减少的增加输入来更有效地学习表示不变。我们将CCLF应用于几种基本RL算法,并对DeepMind Control Suite,Atari和Minigrid基准进行评估,我们的方法与其他最先进的方法相比,我们的方法表明了卓越的样本效率和学习性能。
In reinforcement learning (RL), it is challenging to learn directly from high-dimensional observations, where data augmentation has recently been shown to remedy this via encoding invariances from raw pixels. Nevertheless, we empirically find that not all samples are equally important and hence simply injecting more augmented inputs may instead cause instability in Q-learning. In this paper, we approach this problem systematically by developing a model-agnostic Contrastive-Curiosity-Driven Learning Framework (CCLF), which can fully exploit sample importance and improve learning efficiency in a self-supervised manner. Facilitated by the proposed contrastive curiosity, CCLF is capable of prioritizing the experience replay, selecting the most informative augmented inputs, and more importantly regularizing the Q-function as well as the encoder to concentrate more on under-learned data. Moreover, it encourages the agent to explore with a curiosity-based reward. As a result, the agent can focus on more informative samples and learn representation invariances more efficiently, with significantly reduced augmented inputs. We apply CCLF to several base RL algorithms and evaluate on the DeepMind Control Suite, Atari, and MiniGrid benchmarks, where our approach demonstrates superior sample efficiency and learning performances compared with other state-of-the-art methods.