机器人语言学习的基于语言模型的配对自动编码器

论文标题

机器人语言学习的基于语言模型的配对自动编码器

Language Model-Based Paired Variational Autoencoders for Robotic Language Learning

论文作者

Özdemir, Ozan, Kerzel, Matthias, Weber, Cornelius, Lee, Jae Hee, Wermter, Stefan

论文摘要

人类婴儿在与环境互动时学习语言，在这些环境中，他们的看护人可以描述他们执行的对象和行动。与人类婴儿类似，人造药物可以在与环境互动时学习语言。在这项工作中，首先，我们提出了一个神经模型，该模型在简单的对象操纵场景中双向绑定机器人动作及其语言描述。在我们以前的配对变异自动编码器（PVAE）模型的基础上，我们通过使用不同颜色的立方体并启用替代词汇的产生，证明了变异自动编码器比标准自动编码器的优越性。其他实验表明，模型的通道分离视觉特征提取模块可以应对不同形状的对象。接下来，我们介绍PVAE-BERT，该PVAE-BERT为模型提供了预验证的大规模语言模型，即来自变形金刚（BERT）的双向编码器表示，使该模型仅超出了仅了解网络已受过训练的预先定义的描述；由于模型能够理解相同描述的无限变化，因此对动作描述的识别对不受限制的自然语言进行了概括。我们的实验表明，使用预审前的语言模型作为语言编码器允许我们的方法通过人类用户的说明来扩展实际情况。

Human infants learn language while interacting with their environment in which their caregivers may describe the objects and actions they perform. Similar to human infants, artificial agents can learn language while interacting with their environment. In this work, first, we present a neural model that bidirectionally binds robot actions and their language descriptions in a simple object manipulation scenario. Building on our previous Paired Variational Autoencoders (PVAE) model, we demonstrate the superiority of the variational autoencoder over standard autoencoders by experimenting with cubes of different colours, and by enabling the production of alternative vocabularies. Additional experiments show that the model's channel-separated visual feature extraction module can cope with objects of different shapes. Next, we introduce PVAE-BERT, which equips the model with a pretrained large-scale language model, i.e., Bidirectional Encoder Representations from Transformers (BERT), enabling the model to go beyond comprehending only the predefined descriptions that the network has been trained on; the recognition of action descriptions generalises to unconstrained natural language as the model becomes capable of understanding unlimited variations of the same descriptions. Our experiments suggest that using a pretrained language model as the language encoder allows our approach to scale up for real-world scenarios with instructions from human users.

下载PDF全文

下载文献需遵守相关版权规定

论文标题