论文标题
中提琴:对基于视觉的操作的模仿学习对象提案先验
VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors
论文作者
论文摘要
我们介绍了中提琴,这是一种以对象为中心的模仿学习方法,用于学习机器人操纵的闭环视觉运动策略。我们的方法基于预先训练的视觉模型的一般对象建议构建以对象为中心的表示。中提琴使用基于变压器的策略来推理这些表示形式,并参与与任务相关的视觉因素进行行动预测。这种基于对象的结构先验改善了深层模仿学习算法对物体变化和环境扰动的鲁棒性。我们在模拟和真实机器人中定量评估中提琴。中提琴的成功率优于最先进的模仿学习方法$ 45.8 \%$。它也已成功部署在物理机器人上,以解决具有挑战性的长途任务,例如餐桌布置和咖啡制作。可以在补充材料和项目网站上找到更多视频和模型细节:https://ut-autin-rpl.github.io/viola。
We introduce VIOLA, an object-centric imitation learning approach to learning closed-loop visuomotor policies for robot manipulation. Our approach constructs object-centric representations based on general object proposals from a pre-trained vision model. VIOLA uses a transformer-based policy to reason over these representations and attend to the task-relevant visual factors for action prediction. Such object-based structural priors improve deep imitation learning algorithm's robustness against object variations and environmental perturbations. We quantitatively evaluate VIOLA in simulation and on real robots. VIOLA outperforms the state-of-the-art imitation learning methods by $45.8\%$ in success rate. It has also been deployed successfully on a physical robot to solve challenging long-horizon tasks, such as dining table arrangement and coffee making. More videos and model details can be found in supplementary material and the project website: https://ut-austin-rpl.github.io/VIOLA .