论文标题
不要按照其上一层判断语言模型:与层次注意的对比度学习
Don't Judge a Language Model by Its Last Layer: Contrastive Learning with Layer-Wise Attention Pooling
论文作者
论文摘要
最近的预训练的语言模型(PLM)通过学习语言特征和上下文化的句子表示,在许多自然语言处理任务上取得了巨大成功。由于未清楚地识别出在PLM的堆叠层中捕获的属性,因此通常首选嵌入最后一层的直接方法是从PLM中得出句子表示。本文介绍了基于注意力的合并策略,该策略使该模型能够保留每一层中捕获的图层信号,并学习下游任务的消化语言特征。对比学习目标可以使层面上的注意力汇集到无监督和监督的举止。它导致预先训练的嵌入和更均匀的各向异性空间正常。我们对标准语义文本相似性(STS)和语义搜索任务的模型进行评估。结果,我们的方法提高了基本对比度的bert_base和变体的性能。
Recent pre-trained language models (PLMs) achieved great success on many natural language processing tasks through learning linguistic features and contextualized sentence representation. Since attributes captured in stacked layers of PLMs are not clearly identified, straightforward approaches such as embedding the last layer are commonly preferred to derive sentence representations from PLMs. This paper introduces the attention-based pooling strategy, which enables the model to preserve layer-wise signals captured in each layer and learn digested linguistic features for downstream tasks. The contrastive learning objective can adapt the layer-wise attention pooling to both unsupervised and supervised manners. It results in regularizing the anisotropic space of pre-trained embeddings and being more uniform. We evaluate our model on standard semantic textual similarity (STS) and semantic search tasks. As a result, our method improved the performance of the base contrastive learned BERT_base and variants.