论文标题
具有语义增强的社交媒体文本的指定实体识别
Named Entity Recognition for Social Media Texts with Semantic Augmentation
论文作者
论文摘要
在简短和非正式的文本(尤其是用户生成的社交媒体内容)上进行时,现有的指定实体识别方法遇到了数据稀疏问题。语义增强是减轻此问题的潜在方法。鉴于丰富的语义信息被隐式保存在预训练的单词嵌入中,因此它们是语义增强的理想资源。在本文中,我们提出了一种基于神经的NER方法,以了解社交媒体文本,其中考虑了本地文本和增强语义。特别是,我们从大型语料库中获取增强的语义信息,并提出一个细心的语义增强模块和栅极模块,分别编码和汇总此类信息。从英语和中国社交媒体平台收集的三个基准数据集上进行了广泛的实验,结果证明了我们在所有三个数据集中对先前研究的优越性。
Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts, especially user-generated social media content. Semantic augmentation is a potential way to alleviate this problem. Given that rich semantic information is implicitly preserved in pre-trained word embeddings, they are potential ideal resources for semantic augmentation. In this paper, we propose a neural-based approach to NER for social media texts where both local (from running text) and augmented semantics are taken into account. In particular, we obtain the augmented semantic information from a large-scale corpus, and propose an attentive semantic augmentation module and a gate module to encode and aggregate such information, respectively. Extensive experiments are performed on three benchmark datasets collected from English and Chinese social media platforms, where the results demonstrate the superiority of our approach to previous studies across all three datasets.