大规模多文件摘要，以及信息提取和压缩

论文标题

大规模多文件摘要，以及信息提取和压缩

Large-Scale Multi-Document Summarization with Information Extraction and Compression

论文作者

Wang, Ning, Liu, Han, Klabjan, Diego

论文摘要

我们为多个异质文档的标记数据独立于标记的数据开发一个抽象性汇总框架。与现有的多文件摘要方法不同，我们的框架处理文档讲述了不同的故事，而不是关于同一主题的文档。我们还使用单向语言模型增强了现有的句子融合方法，以优先考虑具有较高句子概率的融合句子，以提高可读性。最后，我们基于CNN/Daily Mail和新闻编辑室数据集构建了十二个数据集变体，其中每个文档组都包含大量多样的文档集合，以评估与其他基线系统相比，我们的模型的性能。我们的实验表明，在这种更通用的环境中，我们的框架优于当前的最新方法。

We develop an abstractive summarization framework independent of labeled data for multiple heterogeneous documents. Unlike existing multi-document summarization methods, our framework processes documents telling different stories instead of documents on the same topic. We also enhance an existing sentence fusion method with a uni-directional language model to prioritize fused sentences with higher sentence probability with the goal of increasing readability. Lastly, we construct a total of twelve dataset variations based on CNN/Daily Mail and the NewsRoom datasets, where each document group contains a large and diverse collection of documents to evaluate the performance of our model in comparison with other baseline systems. Our experiments demonstrate that our framework outperforms current state-of-the-art methods in this more generic setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题