论文标题

通过最高匹配的地理候选人选择的深度学习方法

A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching

论文作者

Ardanuy, Mariona Coll, Hosseini, Kasra, McDonough, Katherine, Krause, Amrey, van Strien, Daniel, Nanni, Federico

论文摘要

为了提供对文本数据的高级语义访问,需要识别toponyms并将其解决到现实世界中的指称。这个过程通常会受到上调的高度差异的阻碍。候选人选择是识别可以通过先前认可的最高名称所引用的潜在实体的任务。尽管传统上在研究社区几乎没有受到关注,但已表明候选人选择对下游任务(即实体解决)有重大影响,尤其是在嘈杂或非标准文本中。在本文中,我们使用最先进的神经网络体系结构引入了一种灵活的深度学习方法,可通过最高的匹配来选择候选。我们基于几个新的现实数据集执行固有的顶级匹配评估,涵盖了各种具有挑战性的方案(跨语义和区域变化以及OCR错误)。我们在现有数据集的下游任务和19世纪英语OCR文本的新的手动宣布资源上报告了其在候选人选择方面的性能。

Recognizing toponyms and resolving them to their real-world referents is required for providing advanced semantic access to textual data. This process is often hindered by the high degree of variation in toponyms. Candidate selection is the task of identifying the potential entities that can be referred to by a toponym previously recognized. While it has traditionally received little attention in the research community, it has been shown that candidate selection has a significant impact on downstream tasks (i.e. entity resolution), especially in noisy or non-standard text. In this paper, we introduce a flexible deep learning method for candidate selection through toponym matching, using state-of-the-art neural network architectures. We perform an intrinsic toponym matching evaluation based on several new realistic datasets, which cover various challenging scenarios (cross-lingual and regional variations, as well as OCR errors). We report its performance on candidate selection in the context of the downstream task of toponym resolution, both on existing datasets and on a new manually-annotated resource of nineteenth-century English OCR'd text.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源