验证语言模型的隐藏状态可变性可以指导减少转移学习的计算

论文标题

验证语言模型的隐藏状态可变性可以指导减少转移学习的计算

Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning

论文作者

Xie, Shuo, Qiu, Jiahao, Pasad, Ankita, Du, Li, Qu, Qing, Mei, Hongyuan

论文摘要

在传输预验证的语言模型的同时，常见方法通常将其特定于任务的分类器附加到顶层，并适应所有预验证的层。我们研究是否可以在要适应哪个层的子集以及放置分类器的位置上进行特定于任务的选择。目的是减少转移学习方法的计算成本（例如微调或调整调整），而无需牺牲其性能。我们建议根据特定于任务的语料库来根据其隐藏状态的可变性选择层。我们说，如果其隐藏状态的类内变异性相对于类别之间的变异性，则一层已经在任务中“概述”。我们的可变性度量价格便宜，不需要任何培训或超参数调整。数据不平衡和数据稀缺性是可靠的。胶水基准的广泛实验表明，基于我们的度量标准的层与使用相同数量的顶层相比，选择层的性能明显更强，并且通常与调整或调整调整器调整整个语言模型的性能相匹配。

While transferring a pretrained language model, common approaches conventionally attach their task-specific classifiers to the top layer and adapt all the pretrained layers. We investigate whether one could make a task-specific selection on which subset of the layers to adapt and where to place the classifier. The goal is to reduce the computation cost of transfer learning methods (e.g. fine-tuning or adapter-tuning) without sacrificing its performance. We propose to select layers based on the variability of their hidden states given a task-specific corpus. We say a layer is already "well-specialized" in a task if the within-class variability of its hidden states is low relative to the between-class variability. Our variability metric is cheap to compute and doesn't need any training or hyperparameter tuning. It is robust to data imbalance and data scarcity. Extensive experiments on the GLUE benchmark demonstrate that selecting layers based on our metric can yield significantly stronger performance than using the same number of top layers and often match the performance of fine-tuning or adapter-tuning the entire language model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题