论文标题
实用的跨模式歧管对准语言
Practical Cross-modal Manifold Alignment for Grounded Language
论文作者
论文摘要
我们提出了一个跨模式歧管对准程序,该程序利用三胞胎损失共同学习一致的基于语言的现实世界概念的一致的多模式嵌入。我们的方法通过对RGB深度图像及其自然语言描述的锚定,正和负数据点的采样三元组来学习这些嵌入。我们表明,与我们的某些基本线相比,我们的方法可以受益但不需要后处理步骤,例如Procrustes分析,这些步骤需要合理的性能。我们在两个用于开发基于机器人的基础语言学习系统的数据集上证明了我们的方法的有效性,在该数据集中,我们的方法在五个评估指标上优于四个基准,包括最先进的方法。
We propose a cross-modality manifold alignment procedure that leverages triplet loss to jointly learn consistent, multi-modal embeddings of language-based concepts of real-world items. Our approach learns these embeddings by sampling triples of anchor, positive, and negative data points from RGB-depth images and their natural language descriptions. We show that our approach can benefit from, but does not require, post-processing steps such as Procrustes analysis, in contrast to some of our baselines which require it for reasonable performance. We demonstrate the effectiveness of our approach on two datasets commonly used to develop robotic-based grounded language learning systems, where our approach outperforms four baselines, including a state-of-the-art approach, across five evaluation metrics.