论文标题
通过观看YouTube视频来学习开车:动作条件的对比策略进行预处理
Learning to Drive by Watching YouTube Videos: Action-Conditioned Contrastive Policy Pretraining
论文作者
论文摘要
旨在将原始视觉观察映射到行动的深度视觉运动策略学习,在控制任务(例如机器人操纵和自动驾驶)中实现了有希望的结果。但是,它需要与培训环境进行大量在线互动,这限制了其现实世界的应用程序。与流行的无监督功能学习以进行视觉识别相比,探索视觉运动控制任务的特征预处理要少得多。在这项工作中,我们的目标是通过观看长达数小时的未经保育的YouTube视频来预先驾驶任务。具体来说,我们使用少量标记数据训练一个反向动态模型,并使用它来预测所有YouTube视频帧的动作标签。然后开发了一种新的对比策略预处理方法,以从带有伪动作标签的视频框架中学习动作条件的功能。实验表明,由此产生的动作条件特征为下游增强学习和模仿学习任务提供了实质性的改进,超出了从以前的无监督学习方法和图预性的体重中预读的重量。代码,模型权重和数据可在以下网址提供:https://metadriverse.github.io/aco。
Deep visuomotor policy learning, which aims to map raw visual observation to action, achieves promising results in control tasks such as robotic manipulation and autonomous driving. However, it requires a huge number of online interactions with the training environment, which limits its real-world application. Compared to the popular unsupervised feature learning for visual recognition, feature pretraining for visuomotor control tasks is much less explored. In this work, we aim to pretrain policy representations for driving tasks by watching hours-long uncurated YouTube videos. Specifically, we train an inverse dynamic model with a small amount of labeled data and use it to predict action labels for all the YouTube video frames. A new contrastive policy pretraining method is then developed to learn action-conditioned features from the video frames with pseudo action labels. Experiments show that the resulting action-conditioned features obtain substantial improvements for the downstream reinforcement learning and imitation learning tasks, outperforming the weights pretrained from previous unsupervised learning methods and ImageNet pretrained weight. Code, model weights, and data are available at: https://metadriverse.github.io/ACO.