提取性摘要作为文本匹配

论文标题

提取性摘要作为文本匹配

Extractive Summarization as Text Matching

论文作者

Zhong, Ming, Liu, Pengfei, Chen, Yiran, Wang, Danqing, Qiu, Xipeng, Huang, Xuanjing

论文摘要

本文就我们构建神经提取性摘要系统的方式创造了范式转变。我们不遵循单独提取句子并建模句子之间的关系的常用框架，而是将提取性摘要任务作为语义文本匹配问题制定，其中源文档和候选摘要（从原始文本中提取）在语义空间中匹配。值得注意的是，在我们对基于数据集属性的句子级别和摘要级提取器之间固有差距的全面分析中，这种范式转向语义匹配框架的综合分析在我们的全面分析中得到了良好的依据。此外，即使使用匹配模型的简单形式实例化框架，我们也将最新的CNN/DailyMail上的最新提取结果驱动到了新的水平（Rouge-1中的44.41）。其他五个数据集的实验还显示了匹配框架的有效性。我们认为，这种基于匹配的摘要框架的力量尚未得到充分利用。为了鼓励将来更多的实例化，我们发布了我们的代码，处理的数据集以及https://github.com/maszhongming/matchsum中生成的摘要。

This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems. Instead of following the commonly used framework of extracting sentences individually and modeling the relationship between sentences, we formulate the extractive summarization task as a semantic text matching problem, in which a source document and candidate summaries will be (extracted from the original text) matched in a semantic space. Notably, this paradigm shift to semantic matching framework is well-grounded in our comprehensive analysis of the inherent gap between sentence-level and summary-level extractors based on the property of the dataset. Besides, even instantiating the framework with a simple form of a matching model, we have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1). Experiments on the other five datasets also show the effectiveness of the matching framework. We believe the power of this matching-based summarization framework has not been fully exploited. To encourage more instantiations in the future, we have released our codes, processed dataset, as well as generated summaries in https://github.com/maszhongming/MatchSum.

下载PDF全文

下载文献需遵守相关版权规定

论文标题