论文标题

实体集扩展的自动上下文模式生成

Automatic Context Pattern Generation for Entity Set Expansion

论文作者

Li, Yinghui, Huang, Shulin, Zhang, Xinwei, Zhou, Qingyu, Li, Yangning, Liu, Ruiyang, Cao, Yunbo, Zheng, Hai-Tao, Shen, Ying

论文摘要

实体集扩展(ESE)是一项有价值的任务,旨在找到给定种子实体所描述的目标语义类别的实体。各种自然语言处理(NLP)和下游应用程序的信息检索(IR)由于其发现知识的能力而受益匪浅。尽管现有的基于语料库的ESE方法取得了巨大进展,但他们仍然依靠带有高质量实体信息的Corpora,因为其中大多数需要通过句子中实体的位置获得上下文模式。因此,给定语料库及其实体注释的质量已成为限制此类方法性能的瓶颈。为了克服这一难题,并使ESE模型摆脱了对实体注释的依赖性,我们的工作旨在探索新的ESE范式,即与语料库无关的ESE。具体而言,我们设计了一个上下文模式生成模块,该模块利用自回归语言模型(例如GPT-2)自动为实体生成高质量的上下文模式。此外,我们提出了GAPA,这是一种新型ESE框架,利用上述生成的模式扩展目标实体。对三个广泛使用的数据集进行了广泛的实验和详细分析,证明了我们方法的有效性。我们实验的所有代码均可在https://github.com/geekjuruo/gapa上找到。

Entity Set Expansion (ESE) is a valuable task that aims to find entities of the target semantic class described by given seed entities. Various Natural Language Processing (NLP) and Information Retrieval (IR) downstream applications have benefited from ESE due to its ability to discover knowledge. Although existing corpus-based ESE methods have achieved great progress, they still rely on corpora with high-quality entity information annotated, because most of them need to obtain the context patterns through the position of the entity in a sentence. Therefore, the quality of the given corpora and their entity annotation has become the bottleneck that limits the performance of such methods. To overcome this dilemma and make the ESE models free from the dependence on entity annotation, our work aims to explore a new ESE paradigm, namely corpus-independent ESE. Specifically, we devise a context pattern generation module that utilizes autoregressive language models (e.g., GPT-2) to automatically generate high-quality context patterns for entities. In addition, we propose the GAPA, a novel ESE framework that leverages the aforementioned GenerAted PAtterns to expand target entities. Extensive experiments and detailed analyses on three widely used datasets demonstrate the effectiveness of our method. All the codes of our experiments are available at https://github.com/geekjuruo/GAPA.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源