论文标题
以对象为中心表示的自我监督的视觉增强学习
Self-supervised Visual Reinforcement Learning with Object-centric Representations
论文作者
论文摘要
自主代理需要大量技能曲目,以合理地对他们以前从未见过的新任务采取行动。但是,对于任何自主代理人来说,仅使用一系列高维,非结构化和未标记的观察来获得这些技能是一个棘手的挑战。以前的方法已使用变异自动编码器将场景编码为低维矢量,可以用作代理商发现新技能的目标。然而,在组成/多对象环境中,很难将所有变异因素分解为整个场景的固定长度表示。我们建议将以对象为中心的表示用作模块化和结构化的观测空间,该观察空间是通过组成生成世界模型来学习的。我们表明,表示形式中的结构与目标条件的注意力政策相结合,有助于自主剂发现和学习有用的技能。这些技能可以进一步合并以解决构图任务,例如操纵几个不同的对象。
Autonomous agents need large repertoires of skills to act reasonably on new tasks that they have not seen before. However, acquiring these skills using only a stream of high-dimensional, unstructured, and unlabeled observations is a tricky challenge for any autonomous agent. Previous methods have used variational autoencoders to encode a scene into a low-dimensional vector that can be used as a goal for an agent to discover new skills. Nevertheless, in compositional/multi-object environments it is difficult to disentangle all the factors of variation into such a fixed-length representation of the whole scene. We propose to use object-centric representations as a modular and structured observation space, which is learned with a compositional generative world model. We show that the structure in the representations in combination with goal-conditioned attention policies helps the autonomous agent to discover and learn useful skills. These skills can be further combined to address compositional tasks like the manipulation of several different objects.