论文标题

在电子商务中指定实体识别的引导程序,并具有积极的未标记学习

Bootstrapping Named Entity Recognition in E-Commerce with Positive Unlabeled Learning

论文作者

Zhang, Hanchu, Hennig, Leonhard, Alt, Christoph, Hu, Changjian, Meng, Yao, Wang, Chao

论文摘要

由于缺乏带注释的数据集,因此在电子商务之类的域名中指定的实体识别(NER)是一个研究的问题。由于其语言复杂性和现有知识资源的覆盖范围较低,因此在该领域(例如产品,组成部分和属性)中识别新型实体类型是具有挑战性的。为了解决这个问题,我们提出了一个自举的正标记学习算法,该学习算法集成了域特异性的语言特征,以快速有效地扩展种子词典。该模型在新颖的产品描述数据集上的平均F1得分为72.02%,比基线BilstM分类器提高了3.63%,尤其表现出更好的召回率(平均为4.96%)。

Named Entity Recognition (NER) in domains like e-commerce is an understudied problem due to the lack of annotated datasets. Recognizing novel entity types in this domain, such as products, components, and attributes, is challenging because of their linguistic complexity and the low coverage of existing knowledge resources. To address this problem, we present a bootstrapped positive-unlabeled learning algorithm that integrates domain-specific linguistic features to quickly and efficiently expand the seed dictionary. The model achieves an average F1 score of 72.02% on a novel dataset of product descriptions, an improvement of 3.63% over a baseline BiLSTM classifier, and in particular exhibits better recall (4.96% on average).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源