通过自我监督的3D表示，视觉增强学习

论文标题

通过自我监督的3D表示，视觉增强学习

Visual Reinforcement Learning with Self-Supervised 3D Representations

论文作者

Ze, Yanjie, Hansen, Nicklas, Chen, Yinbo, Jain, Mohit, Wang, Xiaolong

论文摘要

视觉增强学习（RL）的突出方法是使用自我监督的方法学习内部状态表示，该方法具有通过其他学习信号和归纳偏见来提高样本效率和概括的潜在好处。但是，尽管现实世界本质上是3D，但先前的努力在很大程度上集中在利用2D计算机视觉技术作为辅助自学。在这项工作中，我们提出了一个统一的框架，用于自审学习3D表示以进行电机控制。我们提出的框架由两个阶段组成：一个预处理阶段，在该阶段中，基于大型体素的3D自动编码器在以对象为中心的大型数据集上预估计，以及一个填充阶段，在该阶段中，该表示阶段将与内域数据共同列出表示形式。我们从经验上表明，与2D表示方法相比，我们的方法在模拟操作任务中的样本效率提高了。此外，我们学到的策略将零射击转移到仅具有近似几何对应关系的真实机器人设置，并成功地解决了涉及从单个未校准的RGB摄像机抓住和抬起的电机控制任务。代码和视频可在https://yanjiez.com/3d4rl/上找到。

A prominent approach to visual Reinforcement Learning (RL) is to learn an internal state representation using self-supervised methods, which has the potential benefit of improved sample-efficiency and generalization through additional learning signal and inductive biases. However, while the real world is inherently 3D, prior efforts have largely been focused on leveraging 2D computer vision techniques as auxiliary self-supervision. In this work, we present a unified framework for self-supervised learning of 3D representations for motor control. Our proposed framework consists of two phases: a pretraining phase where a deep voxel-based 3D autoencoder is pretrained on a large object-centric dataset, and a finetuning phase where the representation is jointly finetuned together with RL on in-domain data. We empirically show that our method enjoys improved sample efficiency in simulated manipulation tasks compared to 2D representation learning methods. Additionally, our learned policies transfer zero-shot to a real robot setup with only approximate geometric correspondence, and successfully solve motor control tasks that involve grasping and lifting from a single, uncalibrated RGB camera. Code and videos are available at https://yanjieze.com/3d4rl/ .

下载PDF全文

下载文献需遵守相关版权规定

论文标题