论文标题
KINET:无监督的前进模型用于机器人推动操纵
KINet: Unsupervised Forward Models for Robotic Pushing Manipulation
论文作者
论文摘要
以对象为中心表示是正向预测的重要抽象。大多数现有的远期模型通过广泛的监督(例如,对象类和边界框)学习此表示,尽管实际上不容易访问此类基础信息。为了解决这个问题,我们介绍了Kinet(关键点交互网络) - 一个端到端的无监督框架,以基于关键点表示的对象交互推理。使用视觉观察,我们的模型学会了将对象与关键点坐标相关联,并发现系统的图表表示作为一组关键点嵌入及其关系。然后,它使用对比度估计来学习一个动作条件的前向模型,以预测未来的关键点状态。通过学习在关键空间中执行物理推理,我们的模型自动将其推广到具有不同数量的对象,新颖背景和看不见的对象几何形状的方案。实验证明了我们的模型在准确执行前进预测和学习以对象为中心的代表方面的有效性,用于下游机器人推动操纵任务。
Object-centric representation is an essential abstraction for forward prediction. Most existing forward models learn this representation through extensive supervision (e.g., object class and bounding box) although such ground-truth information is not readily accessible in reality. To address this, we introduce KINet (Keypoint Interaction Network) -- an end-to-end unsupervised framework to reason about object interactions based on a keypoint representation. Using visual observations, our model learns to associate objects with keypoint coordinates and discovers a graph representation of the system as a set of keypoint embeddings and their relations. It then learns an action-conditioned forward model using contrastive estimation to predict future keypoint states. By learning to perform physical reasoning in the keypoint space, our model automatically generalizes to scenarios with a different number of objects, novel backgrounds, and unseen object geometries. Experiments demonstrate the effectiveness of our model in accurately performing forward prediction and learning plannable object-centric representations for downstream robotic pushing manipulation tasks.