与关系网络的空间参考的强大且可解释的基础

论文标题

与关系网络的空间参考的强大且可解释的基础

Robust and Interpretable Grounding of Spatial References with Relation Networks

论文作者

Yang, Tsung-Yen, Lan, Andrew S., Narasimhan, Karthik

论文摘要

自然语言中空间参考的学习表示是自主导航和机器人操纵等任务的关键挑战。最近的工作调查了各种神经体系结构，用于学习空间概念的多模式表示。但是，缺乏对实体的明确推理使这种方法在输入文本或状态观察中容易受到噪声的影响。在本文中，我们开发了有效的模型，以理解本文中的空间参考，这些文本具有稳健性和可解释，而无需牺牲绩效。我们设计了一个文本条件\ textIt {关系网络}，其参数是用跨模式注意模块动态计算的，以捕获实体之间的细粒空间关系。这种设计选择提供了学到的中间输出的解释性。跨三个任务的实验表明，与最先进的系统相比，在预测目标位置方面有17 \％的鲁棒性提高，在预测目标位置方面提高了17 \％。

Learning representations of spatial references in natural language is a key challenge in tasks like autonomous navigation and robotic manipulation. Recent work has investigated various neural architectures for learning multi-modal representations for spatial concepts. However, the lack of explicit reasoning over entities makes such approaches vulnerable to noise in input text or state observations. In this paper, we develop effective models for understanding spatial references in text that are robust and interpretable, without sacrificing performance. We design a text-conditioned \textit{relation network} whose parameters are dynamically computed with a cross-modal attention module to capture fine-grained spatial relations between entities. This design choice provides interpretability of learned intermediate outputs. Experiments across three tasks demonstrate that our model achieves superior performance, with a 17\% improvement in predicting goal locations and a 15\% improvement in robustness compared to state-of-the-art systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题