用于操纵的Visuo-Tactile变压器

论文标题

用于操纵的Visuo-Tactile变压器

Visuo-Tactile Transformers for Manipulation

论文作者

Chen, Yizhou, Sipos, Andrea, Van der Merwe, Mark, Fazeli, Nima

论文摘要

视觉和触摸联合领域中的学习表示可以通过利用共同信息和互补提示来改善操纵灵巧，鲁棒性和样本复杂性。在这里，我们提出了Visuo-Tactile Transformers（VTTS），这是一种新型的多模式的学习方法，适用于基于模型的增强学习和计划。我们的方法扩展了Visual Transformer \ cite {Dosovitskiy2021Image}以处理Visuo-Tactile反馈。具体而言，VTT使用触觉反馈以及自我和跨模式的注意力来构建潜在的热图表示，将注意力集中在视觉域中的重要任务特征上。我们通过对四个模拟机器人任务和一个现实世界中推动任务的基线进行比较评估，证明了VTT对表示学习的功效。我们对VTT的组成部分进行消融研究，以突出交叉模式在表示学习中的重要性。

Learning representations in the joint domain of vision and touch can improve manipulation dexterity, robustness, and sample-complexity by exploiting mutual information and complementary cues. Here, we present Visuo-Tactile Transformers (VTTs), a novel multimodal representation learning approach suited for model-based reinforcement learning and planning. Our approach extends the Visual Transformer \cite{dosovitskiy2021image} to handle visuo-tactile feedback. Specifically, VTT uses tactile feedback together with self and cross-modal attention to build latent heatmap representations that focus attention on important task features in the visual domain. We demonstrate the efficacy of VTT for representation learning with a comparative evaluation against baselines on four simulated robot tasks and one real world block pushing task. We conduct an ablation study over the components of VTT to highlight the importance of cross-modality in representation learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题