论文标题

Wikiasp:用于多域基于方面的摘要的数据集

WikiAsp: A Dataset for Multi-domain Aspect-based Summarization

论文作者

Hayashi, Hiroaki, Budania, Prashant, Wang, Peng, Ackerson, Chris, Neervannan, Raj, Neubig, Graham

论文摘要

基于方面的摘要是基于特定兴趣点生成重点摘要的任务。这些摘要有助于对文本的有效分析,例如快速理解评论或不同角度的意见。但是,由于不同领域的各个方面类型(例如,情感,产品特征)的差异很大,因此先前模型的开发往往是特定于域的。在本文中,我们提出了Wikiasp,Wikiasp是一个用于多域基于方面的摘要的大规模数据集,该数据集试图刺激基于开放域的基于开放域的摘要方向进行研究。具体而言,我们使用来自20个不同域的Wikipedia文章构建数据集,使用每篇文章的部分标题和边界作为方面注释的代理。我们为此任务提出了几个直接基线模型,并在数据集上进行实验。结果突出了现有的摘要模型在这种情况下面临的关键挑战,例如对引用源的适当代词处理以及对时间敏感事件的一致解释。

Aspect-based summarization is the task of generating focused summaries based on specific points of interest. Such summaries aid efficient analysis of text, such as quickly understanding reviews or opinions from different angles. However, due to large differences in the type of aspects for different domains (e.g., sentiment, product features), the development of previous models has tended to be domain-specific. In this paper, we propose WikiAsp, a large-scale dataset for multi-domain aspect-based summarization that attempts to spur research in the direction of open-domain aspect-based summarization. Specifically, we build the dataset using Wikipedia articles from 20 different domains, using the section titles and boundaries of each article as a proxy for aspect annotation. We propose several straightforward baseline models for this task and conduct experiments on the dataset. Results highlight key challenges that existing summarization models face in this setting, such as proper pronoun handling of quoted sources and consistent explanation of time-sensitive events.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源