论文标题

建立一个链接语料库的冰岛实体

Building an Icelandic Entity Linking Corpus

论文作者

Friðriksdóttir, Steinunn Rut, Eggertsson, Valdimar Ágúst, Jóhannesson, Benedikt Geir, Daníelsson, Hjalti, Loftsson, Hrafn, Einarsson, Hafsteinn

论文摘要

在本文中,我们介绍了第一个链接冰岛语料库的实体。我们描述了使用多语言实体链接模型(Mgenre)与Wikipedia API搜索(WAPIS)结合使用的方法来标记我们的数据并将其与仅使用WAPIS进行比较。我们发现,我们的组合方法在我们的语料库上达到53.9%的覆盖范围,而仅使用WAPIS的覆盖率为30.9%。我们分析结果并解释使用冰岛时使用多语言系统的价值。此外,我们分析了仍然没有标记的数据,识别模式,并讨论为什么它们可能很难注释。

In this paper, we present the first Entity Linking corpus for Icelandic. We describe our approach of using a multilingual entity linking model (mGENRE) in combination with Wikipedia API Search (WAPIS) to label our data and compare it to an approach using WAPIS only. We find that our combined method reaches 53.9% coverage on our corpus, compared to 30.9% using only WAPIS. We analyze our results and explain the value of using a multilingual system when working with Icelandic. Additionally, we analyze the data that remain unlabeled, identify patterns and discuss why they may be more difficult to annotate.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源