对比知识图误差检测

论文标题

对比知识图误差检测

Contrastive Knowledge Graph Error Detection

论文作者

Zhang, Qinggang, Dong, Junnan, Duan, Keyu, Huang, Xiao, Liu, Yezi, Xu, Linchuan

论文摘要

知识图（kg）错误引入了不可忽略的噪声，严重影响了与KG相关的下游任务。由于错误的模式未知和多样化，而地面真相标签很少甚至不可用，因此检测kg中的错误是具有挑战性的。传统的解决方案是构建逻辑规则以验证三元组，但由于不同的kg具有不同的规则，涉及域知识。最近的研究重点是设计量身定制的探测器或基于KG嵌入损失的排名三倍。但是，他们都依靠负样本进行培训，这些样本是通过随机替换现有三元组的头部或尾部实体而产生的。这样的负面抽样策略不足以制定实际的kg错误，例如（Bruce_lee，Place_of_birth，中国），其中三个要素通常是相关的，尽管不匹配。我们希望一种针对KG错误检测量身定制的更有效的无监督学习机制。为此，我们提出了一个新颖的框架 - 对比知识图误差检测（笼）。它将对比度学习引入了KG学习，并提供了一种新颖的建模KG。与其遵循传统设置，即将实体视为节点和关系为语义边缘，而是通过将每个关系三重的三重组作为节点来笼罩在不同的超级视图中。在使用KG嵌入和对比度学习损失的联合培训之后，Caged根据两个学习信号（即，跨多视图和三重）的三重表示的一致性评估了每个三重三重的可信度。在三个现实世界中进行的广泛实验表明，在kg误差检测中，笼中的笼子优于最先进的方法。我们的代码和数据集可在https://github.com/qing145/caged.git上找到。

Knowledge Graph (KG) errors introduce non-negligible noise, severely affecting KG-related downstream tasks. Detecting errors in KGs is challenging since the patterns of errors are unknown and diverse, while ground-truth labels are rare or even unavailable. A traditional solution is to construct logical rules to verify triples, but it is not generalizable since different KGs have distinct rules with domain knowledge involved. Recent studies focus on designing tailored detectors or ranking triples based on KG embedding loss. However, they all rely on negative samples for training, which are generated by randomly replacing the head or tail entity of existing triples. Such a negative sampling strategy is not enough for prototyping practical KG errors, e.g., (Bruce_Lee, place_of_birth, China), in which the three elements are often relevant, although mismatched. We desire a more effective unsupervised learning mechanism tailored for KG error detection. To this end, we propose a novel framework - ContrAstive knowledge Graph Error Detection (CAGED). It introduces contrastive learning into KG learning and provides a novel way of modeling KG. Instead of following the traditional setting, i.e., considering entities as nodes and relations as semantic edges, CAGED augments a KG into different hyper-views, by regarding each relational triple as a node. After joint training with KG embedding and contrastive learning loss, CAGED assesses the trustworthiness of each triple based on two learning signals, i.e., the consistency of triple representations across multi-views and the self-consistency within the triple. Extensive experiments on three real-world KGs show that CAGED outperforms state-of-the-art methods in KG error detection. Our codes and datasets are available at https://github.com/Qing145/CAGED.git.

下载PDF全文

下载文献需遵守相关版权规定

论文标题