论文标题
基准测试网络实体的阻止算法
Benchmarking Blocking Algorithms for Web Entities
论文作者
论文摘要
越来越多的实体由相关数据而不是网络上的文档描述。实体分辨率(ER)旨在确定数据网络中一个或跨越知识库中同一现实世界实体的描述。为了减少描述之间所需数量的成对比较,ER方法通常执行预处理步骤,称为\ emph {blocking},该步骤将相似的实体描述放入块中,因此仅在同一块中比较描述。我们实验评估了使用实际数据集为数据网络提出的几种阻止方法,这些方法的特征会显着影响其有效性和效率。提出的实验评估框架使我们能够更好地理解错过的匹配实体描述的特征,并将其与从不同类型的相关性链接获得的基础真理进行对比。
An increasing number of entities are described by interlinked data rather than documents on the Web. Entity Resolution (ER) aims to identify descriptions of the same real-world entity within one or across knowledge bases in the Web of data. To reduce the required number of pairwise comparisons among descriptions, ER methods typically perform a pre-processing step, called \emph{blocking}, which places similar entity descriptions into blocks and thus only compare descriptions within the same block. We experimentally evaluate several blocking methods proposed for the Web of data using real datasets, whose characteristics significantly impact their effectiveness and efficiency. The proposed experimental evaluation framework allows us to better understand the characteristics of the missed matching entity descriptions and contrast them with ground truth obtained from different kinds of relatedness links.