通过元学习的视觉和语言导航的视觉感知概括

论文标题

通过元学习的视觉和语言导航的视觉感知概括

Visual Perception Generalization for Vision-and-Language Navigation via Meta-Learning

论文作者

Wang, Ting, Wu, Zongkai, Wang, Donglin

论文摘要

视觉和语言导航（VLN）是一项具有挑战性的任务，它需要代理通过理解自然语言指示和实时收到的视觉信息来在现实环境中导航。先前的工作已经在连续环境或物理机器人上实现了VLN任务，由于数据集的局限性（例如1.5米高，90度的水平视野（HFOV）），所有这些任务都使用固定的摄像头配置。但是，具有不同用途的现实机器人具有多个摄像机配置，并在视觉上传递了多个机器人的差距。在本文中，我们提出了基于元学习的视觉感知概括策略，该策略使代理可以快速适应带有几张照片的新相机配置。在训练阶段，我们首先将概括问题定位到视觉感知模块，然后比较两种元学习算法，以在可见和看不见的环境中更好地泛化。他们中的一个使用模型不合时宜的元学习（MAML）算法，该算法需要进行一些射击适应，而另一种则是指具有特征仿射转换层的基于公制的元学习方法。实验结果表明，我们的策略成功地将学习的导航模型适应了新的相机配置，并且两种算法分别在可见和看不见的环境中显示了它们的优势。

Vision-and-language navigation (VLN) is a challenging task that requires an agent to navigate in real-world environments by understanding natural language instructions and visual information received in real-time. Prior works have implemented VLN tasks on continuous environments or physical robots, all of which use a fixed camera configuration due to the limitations of datasets, such as 1.5 meters height, 90 degrees horizontal field of view (HFOV), etc. However, real-life robots with different purposes have multiple camera configurations, and the huge gap in visual information makes it difficult to directly transfer the learned navigation model between various robots. In this paper, we propose a visual perception generalization strategy based on meta-learning, which enables the agent to fast adapt to a new camera configuration with a few shots. In the training phase, we first locate the generalization problem to the visual perception module, and then compare two meta-learning algorithms for better generalization in seen and unseen environments. One of them uses the Model-Agnostic Meta-Learning (MAML) algorithm that requires a few shot adaptation, and the other refers to a metric-based meta-learning method with a feature-wise affine transformation layer. The experiment results show that our strategy successfully adapts the learned navigation model to a new camera configuration, and the two algorithms show their advantages in seen and unseen environments respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题