论文标题
柠檬:知识图嵌入的负面抽样的语言模型
LEMON: LanguagE ModeL for Negative Sampling of Knowledge Graph Embeddings
论文作者
论文摘要
知识图嵌入模型已成为机器学习的重要领域。这些模型在知识图中提供了实体和关系的潜在表示,然后可以在下游机器学习任务(例如链接预测)中使用。这些模型的学习过程可以通过对比阳性和负三元组来执行。虽然所有千克的三元组都被认为是正的,但负三元三元均通常不容易获得。因此,获取负三元的采样方法的选择在知识图嵌入模型的性能和有效性中起着至关重要的作用。当前的大多数方法从基础知识图中实体的随机分布中获取负面样本,这些样本通常还包括毫无意义的三元组。其他已知方法使用对抗技术或生成神经网络,从而降低了过程的效率。在本文中,我们提出了一种产生信息的负面样本的方法,以考虑有关实体的可用互补知识。特别是,预训练的语言模型用于通过利用实体之间的距离来通过其文本信息获得符号实体的表示形式。我们的全面评估证明了拟议方法在基准知识图上具有链接预测任务的文本信息的有效性。
Knowledge Graph Embedding models have become an important area of machine learning.Those models provide a latent representation of entities and relations in a knowledge graph which can then be used in downstream machine learning tasks such as link prediction. The learning process of such models can be performed by contrasting positive and negative triples. While all triples of a KG are considered positive, negative triples are usually not readily available. Therefore, the choice of the sampling method to obtain the negative triples play a crucial role in the performance and effectiveness of Knowledge Graph Embedding models. Most of the current methods fetch negative samples from a random distribution of entities in the underlying Knowledge Graph which also often includes meaningless triples. Other known methods use adversarial techniques or generative neural networks which consequently reduce the efficiency of the process. In this paper, we propose an approach for generating informative negative samples considering available complementary knowledge about entities. Particularly, Pre-trained Language Models are used to form neighborhood clusters by utilizing the distances between entities to obtain representations of symbolic entities via their textual information. Our comprehensive evaluations demonstrate the effectiveness of the proposed approach on benchmark Knowledge Graphs with textual information for the link prediction task.