在知识图中使用rdf2vec的节点嵌入的提取策略

论文标题

在知识图中使用rdf2vec的节点嵌入的提取策略

Walk Extraction Strategies for Node Embeddings with RDF2Vec in Knowledge Graphs

论文作者

Vandewiele, Gilles, Steenwinckel, Bram, Bonte, Pieter, Weyns, Michael, Paulheim, Heiko, Ristoski, Petar, De Turck, Filip, Ongenae, Femke

论文摘要

由于kg是符号结构，因此必须应用专门的技术，以使其与数据挖掘技术兼容。 RDF2VEC是一种无监督的技术，可以通过扩展成功的语言建模技术来创建kg中节点的任务信息数值表示。原始作品提出了Weisfeiler-Lehman（WL）内核来提高表示形式的质量。但是，在这项工作中，我们在正式和经验上都表明，WL内核在单个kg的背景下没有做任何改善步行嵌入的作用。作为WL内核的替代方法，我们提出了五种不同的策略，以提取基本随机步行的信息。我们将这些步道在几个基准数据集上进行比较，以表明\ emph {n-gram}策略平均在节点分类任务上表现最好，并且调整步行策略可以改善预测性能。

As KGs are symbolic constructs, specialized techniques have to be applied in order to make them compatible with data mining techniques. RDF2Vec is an unsupervised technique that can create task-agnostic numerical representations of the nodes in a KG by extending successful language modelling techniques. The original work proposed the Weisfeiler-Lehman (WL) kernel to improve the quality of the representations. However, in this work, we show both formally and empirically that the WL kernel does little to improve walk embeddings in the context of a single KG. As an alternative to the WL kernel, we propose five different strategies to extract information complementary to basic random walks. We compare these walks on several benchmark datasets to show that the \emph{n-gram} strategy performs best on average on node classification tasks and that tuning the walk strategy can result in improved predictive performances.

下载PDF全文

下载文献需遵守相关版权规定

论文标题