Apposcorpus：一种新的多语言，多域数据集，用于事实相关生成

论文标题

Apposcorpus：一种新的多语言，多域数据集，用于事实相关生成

The ApposCorpus: A new multilingual, multi-domain dataset for factual appositive generation

论文作者

Kementchedjhieva, Yova, Lu, Di, Tetreault, Joel

论文摘要

新闻文章，图像标题，产品评论以及许多其他文本都提到了他们的名字识别的人和组织可能会因不同的受众而异。在这种情况下，可以以人类或自动生成的同名名词短语的形式提供有关命名实体的背景信息。我们通过跨越四种语言（英语，西班牙语，德语和波兰语），两种实体类型（人和组织）和两个域（Wikipedia和News）实例化了该任务的新，更现实的端到端定义的同样生成的工作。我们对数据和任务进行了广泛的分析，指出了它提出的各种建模挑战。我们使用标准语言生成方法获得的结果表明，该任务确实是不平凡的，并且留出了很大的改进空间。

News articles, image captions, product reviews and many other texts mention people and organizations whose name recognition could vary for different audiences. In such cases, background information about the named entities could be provided in the form of an appositive noun phrase, either written by a human or generated automatically. We expand on the previous work in appositive generation with a new, more realistic, end-to-end definition of the task, instantiated by a dataset that spans four languages (English, Spanish, German and Polish), two entity types (person and organization) and two domains (Wikipedia and News). We carry out an extensive analysis of the data and the task, pointing to the various modeling challenges it poses. The results we obtain with standard language generation methods show that the task is indeed non-trivial, and leaves plenty of room for improvement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题