论文标题
灵活器:灵活的实体分辨率多种意图
FlexER: Flexible Entity Resolution for Multiple Intents
论文作者
论文摘要
实体分辨率是一个长期存在的数据清洁和集成问题,旨在确定代表相同现实世界实体的数据记录。现有方法将实体分辨率视为通用任务,假设存在对现实世界实体的单一解释,而仅着眼于查找匹配的记录,将相对应与非相应的记录分开,这是关于这种单个解释的。但是,在实体分辨率是一个更一般数据项目的一部分的实际情况下,下游应用程序可能对现实世界实体的解释可能有所不同,例如,与各种用户需求有关。在接下来的内容中,我们介绍了多个意图实体解决(MIER)的问题,该问题是通用(单一意图)实体解决任务的扩展。作为解决方案,我们提出了弹性,利用当代解决方案到通用实体解决任务来解决多个意图实体解决方案。 Flexer将问题作为多标签分类问题解决。它使用多重图表表示元组对的基于意图的表示,该图表示是图形神经网络(GNN)的输入。 Flexer学习意图表示并改善了多个分辨率问题的结果。大规模的经验评估引入了一个新的基准测试,并且还使用两个众所周知的基准测试,表明弹性有效地解决了MIER问题,并优于通用实体解决方案的最先进。
Entity resolution, a longstanding problem of data cleaning and integration, aims at identifying data records that represent the same real-world entity. Existing approaches treat entity resolution as a universal task, assuming the existence of a single interpretation of a real-world entity and focusing only on finding matched records, separating corresponding from non-corresponding ones, with respect to this single interpretation. However, in real-world scenarios, where entity resolution is part of a more general data project, downstream applications may have varying interpretations of real-world entities relating, for example, to various user needs. In what follows, we introduce the problem of multiple intents entity resolution (MIER), an extension to the universal (single intent) entity resolution task. As a solution, we propose FlexER, utilizing contemporary solutions to universal entity resolution tasks to solve multiple intents entity resolution. FlexER addresses the problem as a multi-label classification problem. It combines intent-based representations of tuple pairs using a multiplex graph representation that serves as an input to a graph neural network (GNN). FlexER learns intent representations and improves the outcome to multiple resolution problems. A large-scale empirical evaluation introduces a new benchmark and, using also two well-known benchmarks, shows that FlexER effectively solves the MIER problem and outperforms the state-of-the-art for a universal entity resolution.