论文标题

通过基于置信的多级积极和未标记的学习,远距离监督指定的实体识别

Distantly Supervised Named Entity Recognition via Confidence-Based Multi-Class Positive and Unlabeled Learning

论文作者

Zhou, Kang, Li, Yuepei, Li, Qi

论文摘要

在本文中,我们在遥远的监督下研究了指定的实体识别(NER)问题。由于外部字典和/或知识库的不完整,这种遥远注释的训练数据通常会遭受高误资率。为此,我们通过多级积极和未标记的学习(MPU)学习制定了遥远监督的NER(DS-NER)问题,并提出了一种理论上和实际的基于置信度的MPU(CONF MPU)方法。为了处理不完整的注释,Conf-MPU由两个步骤组成。首先,估计作为实体令牌的每个令牌的置信度得分。然后,提出的COND-MPU风险估计用于培训多级分类器以进行NER任务。通过各种外部知识标记的两个基准数据集进行了彻底的实验,证明了所提出的COND-MPU优于现有DS-NER方法。

In this paper, we study the named entity recognition (NER) problem under distant supervision. Due to the incompleteness of the external dictionaries and/or knowledge bases, such distantly annotated training data usually suffer from a high false negative rate. To this end, we formulate the Distantly Supervised NER (DS-NER) problem via Multi-class Positive and Unlabeled (MPU) learning and propose a theoretically and practically novel CONFidence-based MPU (Conf-MPU) approach. To handle the incomplete annotations, Conf-MPU consists of two steps. First, a confidence score is estimated for each token of being an entity token. Then, the proposed Conf-MPU risk estimation is applied to train a multi-class classifier for the NER task. Thorough experiments on two benchmark datasets labeled by various external knowledge demonstrate the superiority of the proposed Conf-MPU over existing DS-NER methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源