论文标题
多XSCIENCE:一个大型数据集,用于极端多文件摘要的科学文章
Multi-XScience: A Large-scale Dataset for Extreme Multi-document Summarization of Scientific Articles
论文作者
论文摘要
多文件摘要是一项艰巨的任务,几乎没有大规模数据集。我们提出了Multi-Xscience,这是一种由科学文章创建的大规模多文件摘要数据集。 Multi-Xscience介绍了一个具有挑战性的多文章摘要任务:根据其摘要及其参考文章编写论文的相关部分。我们的工作灵感来自极端总结,这是一种有利于抽象建模方法的数据集构造协议。描述性统计和经验结果---使用在多XSCIECH数据集中训练的几种最先进的模型----揭示了多XSCIENCE非常适合抽象模型。
Multi-document summarization is a challenging task for which there exists little large-scale datasets. We propose Multi-XScience, a large-scale multi-document summarization dataset created from scientific articles. Multi-XScience introduces a challenging multi-document summarization task: writing the related-work section of a paper based on its abstract and the articles it references. Our work is inspired by extreme summarization, a dataset construction protocol that favours abstractive modeling approaches. Descriptive statistics and empirical results---using several state-of-the-art models trained on the Multi-XScience dataset---reveal that Multi-XScience is well suited for abstractive models.