轻量级探测无监督的表示形式

论文标题

轻量级探测无监督的表示形式

Light-weight probing of unsupervised representations for Reinforcement Learning

论文作者

Zhang, Wancong, GX-Chen, Anthony, Sobal, Vlad, LeCun, Yann, Carion, Nicolas

论文摘要

无监督的视觉表示学习提供了一个机会，可以利用大型未标记轨迹的大型语料库形成有用的视觉表示，这可以使强化学习（RL）算法的培训受益。但是，评估此类表示的适应性需要培训RL算法，该算法在计算上是密集的，并且具有较高的差异结果。受视力社区的启发，我们研究线性探测是否可以成为无监督RL表示质量的代理评估任务。具体来说，我们探究了在给定状态下观察到的奖励以及在给定状态下的专家的行动，这两者通常都适用于许多RL域。通过严格的实验，我们表明探测任务与ATARI100K基准的下游RL性能密切相关，同时具有较低的差异和降低600倍的计算成本。这提供了一种更有效的方法，用于探索训练算法的空间并确定有希望的预处理配方，而无需在每个设置中运行RL评估。利用这一框架，我们进一步改善了RL的现有自我监督学习（SSL）食谱，突出了前向模型的重要性，视觉骨架的大小以及无监督目标的精确配方。

Unsupervised visual representation learning offers the opportunity to leverage large corpora of unlabeled trajectories to form useful visual representations, which can benefit the training of reinforcement learning (RL) algorithms. However, evaluating the fitness of such representations requires training RL algorithms which is computationally intensive and has high variance outcomes. Inspired by the vision community, we study whether linear probing can be a proxy evaluation task for the quality of unsupervised RL representation. Specifically, we probe for the observed reward in a given state and the action of an expert in a given state, both of which are generally applicable to many RL domains. Through rigorous experimentation, we show that the probing tasks are strongly rank correlated with the downstream RL performance on the Atari100k Benchmark, while having lower variance and up to 600x lower computational cost. This provides a more efficient method for exploring the space of pretraining algorithms and identifying promising pretraining recipes without the need to run RL evaluations for every setting. Leveraging this framework, we further improve existing self-supervised learning (SSL) recipes for RL, highlighting the importance of the forward model, the size of the visual backbone, and the precise formulation of the unsupervised objective.

下载PDF全文

下载文献需遵守相关版权规定

论文标题