通过辅助集合产生和共同扩展为基于语料库的集合扩展

论文标题

通过辅助集合产生和共同扩展为基于语料库的集合扩展

Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and Co-Expansion

论文作者

Huang, Jiaxin, Xie, Yiqing, Meng, Yu, Shen, Jiaming, Zhang, Yunyi, Han, Jiawei

论文摘要

鉴于一小部分种子实体（例如``美国''，``俄罗斯''），基于语料库的集合扩展是为了诱导一组广泛的实体，这些实体共享给定语料库的相同语义类（在本示例中）。设置扩展有益于知识发现中的广泛下游应用程序，例如网络搜索，分类法构建和查询建议。现有基于语料库的集合扩展算法通常通过合并词汇模式和分布相似性来引导给定种子。但是，由于没有明确提供的负面组合，这些方法遭受语义漂移而遭受的语义漂移，因为在没有指导的情况下可以自由扩展种子。我们提出了一个新的框架，即Set-CoExpan，该框架自动生成辅助集作为负面集合，与目标集的目标集密切相关，然后执行多个共同扩展，从而通过将目标设置与辅助设置进行比较，从而形成多个凝聚组，从而形成另一个与另一个凝聚力相关的互换，从而解决了另一个凝聚力，从而解决了一种偏见。在本文中，我们证明，通过生成辅助集合，我们可以指导目标集合的扩展过程，以避免使用辅助集合触摸边界周围的那些模棱两可的区域，我们表明Set-CoExpan优于强大的基线方法。

Given a small set of seed entities (e.g., ``USA'', ``Russia''), corpus-based set expansion is to induce an extensive set of entities which share the same semantic class (Country in this example) from a given corpus. Set expansion benefits a wide range of downstream applications in knowledge discovery, such as web search, taxonomy construction, and query suggestion. Existing corpus-based set expansion algorithms typically bootstrap the given seeds by incorporating lexical patterns and distributional similarity. However, due to no negative sets provided explicitly, these methods suffer from semantic drift caused by expanding the seed set freely without guidance. We propose a new framework, Set-CoExpan, that automatically generates auxiliary sets as negative sets that are closely related to the target set of user's interest, and then performs multiple sets co-expansion that extracts discriminative features by comparing target set with auxiliary sets, to form multiple cohesive sets that are distinctive from one another, thus resolving the semantic drift issue. In this paper we demonstrate that by generating auxiliary sets, we can guide the expansion process of target set to avoid touching those ambiguous areas around the border with auxiliary sets, and we show that Set-CoExpan outperforms strong baseline methods significantly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题