通过跨模式监督学习视觉运动

论文标题

通过跨模式监督学习视觉运动

Learning Visual Locomotion with Cross-Modal Supervision

论文作者

Loquercio, Antonio, Kumar, Ashish, Malik, Jitendra

论文摘要

在这项工作中，我们展示了如何学习仅使用单眼RGB相机和本体感受的视觉步行政策。由于模拟RGB很难，因此我们必须在现实世界中学习愿景。我们首先是一项盲目的步行政策，接受了模拟训练。这项政策可以在现实世界中穿越一些地形，但经常挣扎，因为它缺乏对即将到来的几何形状的了解。可以通过视觉的使用来解决。我们在现实世界中训练一个视觉模块，通过提出的算法跨模式监督（CMS）来预测即将到来的地形。 CMS使用时移的本体感受来监督视觉，并允许通过更真实的经验来不断改进政策。我们评估了基于视力的步行政策，包括各种各样的地形，包括楼梯（高达19厘米），滑坡（35度的倾斜度），路缘和高台阶（最高20厘米）以及复杂的离散地形。我们通过少于30分钟的现实数据来实现这一性能。最后，我们表明我们的政策可以适应有限的现实经验的视野变化。 https://antonilo.github.io/vision_locomotion/上的视频结果和代码。

In this work, we show how to learn a visual walking policy that only uses a monocular RGB camera and proprioception. Since simulating RGB is hard, we necessarily have to learn vision in the real world. We start with a blind walking policy trained in simulation. This policy can traverse some terrains in the real world but often struggles since it lacks knowledge of the upcoming geometry. This can be resolved with the use of vision. We train a visual module in the real world to predict the upcoming terrain with our proposed algorithm Cross-Modal Supervision (CMS). CMS uses time-shifted proprioception to supervise vision and allows the policy to continually improve with more real-world experience. We evaluate our vision-based walking policy over a diverse set of terrains including stairs (up to 19cm high), slippery slopes (inclination of 35 degrees), curbs and tall steps (up to 20cm), and complex discrete terrains. We achieve this performance with less than 30 minutes of real-world data. Finally, we show that our policy can adapt to shifts in the visual field with a limited amount of real-world experience. Video results and code at https://antonilo.github.io/vision_locomotion/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题