适应性适配器

论文标题

适应性适配器

Adaptable Adapters

论文作者

Moosavi, Nafise Sadat, Delfosse, Quentin, Kersting, Kristian, Gurevych, Iryna

论文摘要

最先进的NLP型号包含一亿至万亿个参数。适配器为完整的填充提供了一种参数效率的替代方案，在该填充中，我们只能在预审计的重量上进行轻巧的轻巧神经网络层。适配器层是随机初始化的。但是，对于每个数据集，现有工作使用相同的适配器体系结构（即，在预审计模型的每个层的顶部的相同的适配器层），无论数据集的属性如何或可用培训数据的数量。在这项工作中，我们介绍了包含（1）对不同层和不同输入数据的不同激活功能的适应性适配器，以及（2）可学习的开关以选择并仅使用有益的适配器层。我们表明，适应性适配器可以通过标准适配器体系结构实现PAR性能，同时使用较少数量的适配器层。此外，我们还表明，通过自适应适配器所选的适配器体系结构在不同的数据设置和类似任务上很好地传输。我们建议使用适应性适配器来设计高效有效的适配器体系结构。所得的适配器（a）包含标准适配器的学习参数的约50％，因此在训练和推理方面更有效，需要更少的存储空间，（b）在低数据表设置中的性能更高。

State-of-the-art pretrained NLP models contain a hundred million to trillion parameters. Adapters provide a parameter-efficient alternative for the full finetuning in which we can only finetune lightweight neural network layers on top of pretrained weights. Adapter layers are initialized randomly. However, existing work uses the same adapter architecture -- i.e., the same adapter layer on top of each layer of the pretrained model -- for every dataset, regardless of the properties of the dataset or the amount of available training data. In this work, we introduce adaptable adapters that contain (1) learning different activation functions for different layers and different input data, and (2) a learnable switch to select and only use the beneficial adapter layers. We show that adaptable adapters achieve on-par performances with the standard adapter architecture while using a considerably smaller number of adapter layers. In addition, we show that the selected adapter architecture by adaptable adapters transfers well across different data settings and similar tasks. We propose to use adaptable adapters for designing efficient and effective adapter architectures. The resulting adapters (a) contain about 50% of the learning parameters of the standard adapter and are therefore more efficient at training and inference, and require less storage space, and (b) achieve considerably higher performances in low-data settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题