用于基础语言学习的多模式数据集的演示和分析

论文标题

用于基础语言学习的多模式数据集的演示和分析

Presentation and Analysis of a Multimodal Dataset for Grounded Language Learning

论文作者

Jenkins, Patrick, Sachdeva, Rishabh, Kebe, Gaoussou Youssouf, Higgins, Padraig, Darvish, Kasra, Raff, Edward, Engel, Don, Winder, John, Ferraro, Francis, Matuszek, Cynthia

论文摘要

基础语言获取 - 学习基于语言的互动是如何指向周围的世界 - 是Amajor Robotics，NLP和HCI的研究领域。在实践中，用于学习的数据几乎完全由文本描述组成，这些描述往往比实际的人类互动更清洁，更清晰，更语义。在这项工作中，我们介绍了基础语言数据集（Gold），这是人们使用口语或书面语言描述的常见家庭对象的多模式数据集。我们分析了差异，并提出了一个实验，以表明不同方式如何影响人类中的语言学习。这将使研究机器人技术，NLP和HCI的交集的研究人员能够更好地研究图像，文本和语音的多种方式如何相互作用，并显示这些模态的观念影响的差异。

Grounded language acquisition -- learning how language-based interactions refer to the world around them -- is amajor area of research in robotics, NLP, and HCI. In practice the data used for learning consists almost entirely of textual descriptions, which tend to be cleaner, clearer, and more grammatical than actual human interactions. In this work, we present the Grounded Language Dataset (GoLD), a multimodal dataset of common household objects described by people using either spoken or written language. We analyze the differences and present an experiment showing how the different modalities affect language learning from human in-put. This will enable researchers studying the intersection of robotics, NLP, and HCI to better investigate how the multiple modalities of image, text, and speech interact, as well as show differences in the vernacular of these modalities impact results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题