通过同态投影蒸馏来压缩语义检索的句子表示

论文标题

通过同态投影蒸馏来压缩语义检索的句子表示

Compressing Sentence Representation for Semantic Retrieval via Homomorphic Projective Distillation

论文作者

Zhao, Xuandong, Yu, Zhiguo, Wu, Ming, Li, Lei

论文摘要

如何学习高度紧凑而有效的句子表示？预训练的语言模型在许多NLP任务中都是有效的。但是，这些模型通常是巨大的，并且会产生大句子的嵌入。此外，大型和小型型号之间存在巨大的性能差距。在本文中，我们建议同型投射蒸馏（HPD）学习压缩句子嵌入。我们的方法增强了具有可学习的投影层的小型变压器编码器模型，以产生紧凑的表示形式，同时模仿大型的预训练的语言模型以保留句子表示质量。我们在语义文本相似性（STS）和语义检索（SR）任务上使用不同的模型大小评估我们的方法。实验表明，与以前相同大小的最佳表示相比，我们的方法在STS任务上获得2.7-4.4点的性能增益。在SR任务中，与最先进的大型型号相比，我们的方法提高了检索速度（8.2 $ \ times $）和内存使用率（8.0 $ \ times $）。

How to learn highly compact yet effective sentence representation? Pre-trained language models have been effective in many NLP tasks. However, these models are often huge and produce large sentence embeddings. Moreover, there is a big performance gap between large and small models. In this paper, we propose Homomorphic Projective Distillation (HPD) to learn compressed sentence embeddings. Our method augments a small Transformer encoder model with learnable projection layers to produce compact representations while mimicking a large pre-trained language model to retain the sentence representation quality. We evaluate our method with different model sizes on both semantic textual similarity (STS) and semantic retrieval (SR) tasks. Experiments show that our method achieves 2.7-4.5 points performance gain on STS tasks compared with previous best representations of the same size. In SR tasks, our method improves retrieval speed (8.2$\times$) and memory usage (8.0$\times$) compared with state-of-the-art large models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题