论文标题
迈向准确,一致的评估:用于遥远的关系提取的数据集
Towards Accurate and Consistent Evaluation: A Dataset for Distantly-Supervised Relation Extraction
论文作者
论文摘要
近年来,通过使用深层神经网络,遥不可及的关系提取取得了一定的成功。远处监督(DS)可以通过将知识库(KB)的实体对与句子相提并论,从而自动生成大规模注释的数据。但是,这些DS生成的数据集不可避免地具有错误的标签,从而导致测试过程中的评估得分不正确,这可能会误导研究人员。为了解决此问题,我们构建了一个新的数据集NYTH,在该数据集中,我们将DS生成的数据用作培训数据并雇用注释者标记测试数据。与以前的数据集相比,NYT-H具有更大的测试集,然后我们可以执行更准确,一致的评估。最后,我们介绍了NYT-H上几个广泛使用的系统的实验结果。实验结果表明,DS标签测试数据和人类注销的测试数据上比较系统的排名列表不同。这表明我们的人类通知数据对于评估遥远的关系提取是必要的。
In recent years, distantly-supervised relation extraction has achieved a certain success by using deep neural networks. Distant Supervision (DS) can automatically generate large-scale annotated data by aligning entity pairs from Knowledge Bases (KB) to sentences. However, these DS-generated datasets inevitably have wrong labels that result in incorrect evaluation scores during testing, which may mislead the researchers. To solve this problem, we build a new dataset NYTH, where we use the DS-generated data as training data and hire annotators to label test data. Compared with the previous datasets, NYT-H has a much larger test set and then we can perform more accurate and consistent evaluation. Finally, we present the experimental results of several widely used systems on NYT-H. The experimental results show that the ranking lists of the comparison systems on the DS-labelled test data and human-annotated test data are different. This indicates that our human-annotated data is necessary for evaluation of distantly-supervised relation extraction.