迈向准确，一致的评估：用于遥远的关系提取的数据集

论文标题

迈向准确，一致的评估：用于遥远的关系提取的数据集

Towards Accurate and Consistent Evaluation: A Dataset for Distantly-Supervised Relation Extraction

论文作者

Zhu, Tong, Wang, Haitao, Yu, Junjie, Zhou, Xiabing, Chen, Wenliang, Zhang, Wei, Zhang, Min

论文摘要

近年来，通过使用深层神经网络，遥不可及的关系提取取得了一定的成功。远处监督（DS）可以通过将知识库（KB）的实体对与句子相提并论，从而自动生成大规模注释的数据。但是，这些DS生成的数据集不可避免地具有错误的标签，从而导致测试过程中的评估得分不正确，这可能会误导研究人员。为了解决此问题，我们构建了一个新的数据集NYTH，在该数据集中，我们将DS生成的数据用作培训数据并雇用注释者标记测试数据。与以前的数据集相比，NYT-H具有更大的测试集，然后我们可以执行更准确，一致的评估。最后，我们介绍了NYT-H上几个广泛使用的系统的实验结果。实验结果表明，DS标签测试数据和人类注销的测试数据上比较系统的排名列表不同。这表明我们的人类通知数据对于评估遥远的关系提取是必要的。

In recent years, distantly-supervised relation extraction has achieved a certain success by using deep neural networks. Distant Supervision (DS) can automatically generate large-scale annotated data by aligning entity pairs from Knowledge Bases (KB) to sentences. However, these DS-generated datasets inevitably have wrong labels that result in incorrect evaluation scores during testing, which may mislead the researchers. To solve this problem, we build a new dataset NYTH, where we use the DS-generated data as training data and hire annotators to label test data. Compared with the previous datasets, NYT-H has a much larger test set and then we can perform more accurate and consistent evaluation. Finally, we present the experimental results of several widely used systems on NYT-H. The experimental results show that the ranking lists of the comparison systems on the DS-labelled test data and human-annotated test data are different. This indicates that our human-annotated data is necessary for evaluation of distantly-supervised relation extraction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题