变压器几乎没有射击序列学习

论文标题

变压器几乎没有射击序列学习

Few-shot Sequence Learning with Transformers

论文作者

Logeswaran, Lajanugen, Lee, Ann, Ott, Myle, Lee, Honglak, Ranzato, Marc'Aurelio, Szlam, Arthur

论文摘要

很少有射击算法旨在学习新任务，只提供了少数培训示例。在这项工作中，我们研究了数据点是令牌序列并提出基于变压器的有效学习算法的情况下的几个射击学习。在最简单的设置中，我们将令牌附加到代表要执行的特定任务的输入序列上，并证明可以在几乎没有标记的示例的情况下，可以在即时优化该令牌的嵌入。我们的方法不需要对模型体系结构（例如适配器层或计算二阶导数）进行复杂的更改，因为当前在元学习和几乎没有学习文献中流行。我们在各种任务上演示了我们的方法，并分析了几种模型变体和基线方法的概括属性。特别是，我们表明组成任务描述符可以提高性能。实验表明，我们的方法至少与其他方法一样起作用，同时在计算上更有效。

Few-shot algorithms aim at learning new tasks provided only a handful of training examples. In this work we investigate few-shot learning in the setting where the data points are sequences of tokens and propose an efficient learning algorithm based on Transformers. In the simplest setting, we append a token to an input sequence which represents the particular task to be undertaken, and show that the embedding of this token can be optimized on the fly given few labeled examples. Our approach does not require complicated changes to the model architecture such as adapter layers nor computing second order derivatives as is currently popular in the meta-learning and few-shot learning literature. We demonstrate our approach on a variety of tasks, and analyze the generalization properties of several model variants and baseline approaches. In particular, we show that compositional task descriptors can improve performance. Experiments show that our approach works at least as well as other methods, while being more computationally efficient.

下载PDF全文

下载文献需遵守相关版权规定

论文标题