ERED：具有实体和描述的增强文本表示

论文标题

ERED：具有实体和描述的增强文本表示

Ered: Enhanced Text Representations with Entities and Descriptions

论文作者

Zhao, Qinghua, Ma, Shuai, Lei, Yuxuan

论文摘要

外部知识，例如实体和实体描述，可以帮助人类理解文本。已经探索了许多作品，以在预训练的模型中包括外部知识。通常，这些方法通常会设计预训练任务，并通过更新模型权重来隐式介绍知识，或者，将其与原始文本直接使用。尽管有效，但是有一些局限性。一方面，它是隐式的，并且只关注模型权重，预先训练的实体嵌入被忽略。另一方面，实体描述可能很漫长，并且与原始文本一起输入模型可能会分散模型的注意力。本文旨在在微调阶段明确包含实体和实体描述。首先，预训练的实体嵌入与原始文本表示形式融合在一起，并由backbone模型逐层更新。其次，描述由骨干模型之外的知识模块表示，每个知识层选择性地连接到一个骨干层以进行融合。第三，两个与知识有关的辅助任务，即实体/描述增强和实体增强/污染任务，旨在平滑进化表示的语义差距。我们对四个面向知识的任务和两个常见任务进行了实验，结果在几个数据集上实现了新的最新时间。此外，我们进行了一项消融研究，以表明我们方法中的每个模块都是必要的。该代码可在https://github.com/lshowway/ered上找到。

External knowledge,e.g., entities and entity descriptions, can help humans understand texts. Many works have been explored to include external knowledge in the pre-trained models. These methods, generally, design pre-training tasks and implicitly introduce knowledge by updating model weights, alternatively, use it straightforwardly together with the original text. Though effective, there are some limitations. On the one hand, it is implicit and only model weights are paid attention to, the pre-trained entity embeddings are ignored. On the other hand, entity descriptions may be lengthy, and inputting into the model together with the original text may distract the model's attention. This paper aims to explicitly include both entities and entity descriptions in the fine-tuning stage. First, the pre-trained entity embeddings are fused with the original text representation and updated by the backbone model layer by layer. Second, descriptions are represented by the knowledge module outside the backbone model, and each knowledge layer is selectively connected to one backbone layer for fusing. Third, two knowledge-related auxiliary tasks, i.e., entity/description enhancement and entity enhancement/pollution task, are designed to smooth the semantic gaps among evolved representations. We conducted experiments on four knowledge-oriented tasks and two common tasks, and the results achieved new state-of-the-art on several datasets. Besides, we conduct an ablation study to show that each module in our method is necessary. The code is available at https://github.com/lshowway/Ered.

下载PDF全文

下载文献需遵守相关版权规定

论文标题