论文标题
通过中级视觉表示的强大政策:操纵和导航的实验研究
Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation
论文作者
论文摘要
基于视觉的机器人通常将控制循环分为一个模块,以进行感知和一个单独的控制模块。可以端到端训练整个系统(例如,使用深度RL),但是“从头开始”会带有较高的样本复杂性成本,最终结果通常很脆弱,如果测试环境与培训的环境不同,则意外失败。 我们研究使用中级视觉表示(用于传统计算机视觉目标的异步学到的功能),作为在端到端RL框架中的通用且易于删除的感知状态的效果。中层表示编码有关世界的不断增长,我们表明它们有助于概括,提高样本复杂性并导致更高的最终表现。与纳入不变的其他方法(例如域随机化)相比,异步训练的中级表示量表更好:既适合更严重的问题,又是更大的域移动。在实践中,这意味着可以使用中级表示形式成功地培训策略,以实现域随机化和学习失败的任务。我们报告有关操纵和导航任务的结果,而导航包括对真实机器人的零射击SIM到现实实验。
Vision-based robotics often separates the control loop into one module for perception and a separate module for control. It is possible to train the whole system end-to-end (e.g. with deep RL), but doing it "from scratch" comes with a high sample complexity cost and the final result is often brittle, failing unexpectedly if the test environment differs from that of training. We study the effects of using mid-level visual representations (features learned asynchronously for traditional computer vision objectives), as a generic and easy-to-decode perceptual state in an end-to-end RL framework. Mid-level representations encode invariances about the world, and we show that they aid generalization, improve sample complexity, and lead to a higher final performance. Compared to other approaches for incorporating invariances, such as domain randomization, asynchronously trained mid-level representations scale better: both to harder problems and to larger domain shifts. In practice, this means that mid-level representations could be used to successfully train policies for tasks where domain randomization and learning-from-scratch failed. We report results on both manipulation and navigation tasks, and for navigation include zero-shot sim-to-real experiments on real robots.