部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

论文作者

Fu, Chin-Lun, Chen, Zih-Ching, Lee, Yun-Ru, Lee, Hung-yi

论文摘要

具有数百万参数的基于变压器的预训练模型需要大量存储。最近的方法可以通过培训适配器来解决这一缺点，但是这些方法仍然需要相对较大的参数。在这项研究中，提出了一种令人惊讶的简单但有效的适配器体系结构的Adapterbias。 AdapterBias向变压器层的隐藏输出添加了代币依赖性转移，以适应仅使用向量和线性层的下游任务。进行了广泛的实验，以证明适配性的有效性。实验表明，与先前的作品相比，我们提出的方法可以大大减少可训练的参数，而任务性能的降低与精细训练的预训练模型相比最小。我们进一步发现，适应性比亚斯自动学习以将更重要的表示形式分配给与任务相关的代币转移。

Transformer-based pre-trained models with millions of parameters require large storage. Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters. In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed. AdapterBias adds a token-dependent shift to the hidden output of transformer layers to adapt to downstream tasks with only a vector and a linear layer. Extensive experiments are conducted to demonstrate the effectiveness of AdapterBias. The experiments show that our proposed method can dramatically reduce the trainable parameters compared to the previous works with a minimal decrease in task performances compared with fine-tuned pre-trained models. We further find that AdapterBias automatically learns to assign more significant representation shifts to the tokens related to the task in consideration.

下载PDF全文

下载文献需遵守相关版权规定

论文标题