通过中级视觉表示的强大政策：操纵和导航的实验研究

论文标题

通过中级视觉表示的强大政策：操纵和导航的实验研究

Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation

论文作者

Chen, Bryan, Sax, Alexander, Lewis, Gene, Armeni, Iro, Savarese, Silvio, Zamir, Amir, Malik, Jitendra, Pinto, Lerrel

论文摘要

基于视觉的机器人通常将控制循环分为一个模块，以进行感知和一个单独的控制模块。可以端到端训练整个系统（例如，使用深度RL），但是“从头开始”会带有较高的样本复杂性成本，最终结果通常很脆弱，如果测试环境与培训的环境不同，则意外失败。我们研究使用中级视觉表示（用于传统计算机视觉目标的异步学到的功能），作为在端到端RL框架中的通用且易于删除的感知状态的效果。中层表示编码有关世界的不断增长，我们表明它们有助于概括，提高样本复杂性并导致更高的最终表现。与纳入不变的其他方法（例如域随机化）相比，异步训练的中级表示量表更好：既适合更严重的问题，又是更大的域移动。在实践中，这意味着可以使用中级表示形式成功地培训策略，以实现域随机化和学习失败的任务。我们报告有关操纵和导航任务的结果，而导航包括对真实机器人的零射击SIM到现实实验。

Vision-based robotics often separates the control loop into one module for perception and a separate module for control. It is possible to train the whole system end-to-end (e.g. with deep RL), but doing it "from scratch" comes with a high sample complexity cost and the final result is often brittle, failing unexpectedly if the test environment differs from that of training. We study the effects of using mid-level visual representations (features learned asynchronously for traditional computer vision objectives), as a generic and easy-to-decode perceptual state in an end-to-end RL framework. Mid-level representations encode invariances about the world, and we show that they aid generalization, improve sample complexity, and lead to a higher final performance. Compared to other approaches for incorporating invariances, such as domain randomization, asynchronously trained mid-level representations scale better: both to harder problems and to larger domain shifts. In practice, this means that mid-level representations could be used to successfully train policies for tasks where domain randomization and learning-from-scratch failed. We report results on both manipulation and navigation tasks, and for navigation include zero-shot sim-to-real experiments on real robots.

下载PDF全文

下载文献需遵守相关版权规定

论文标题