论文标题
反事实数据增强改善了抽象性摘要的事实
Counterfactual Data Augmentation improves Factuality of Abstractive Summarization
论文作者
论文摘要
基于预处理的语言模型的抽象性摘要系统通常会产生连贯但事实不一致的句子。在本文中,我们提出了一种反事实数据增强方法,在该方法中,我们以扰动的摘要增强数据,以增加培训数据多样性。具体而言,我们提出了三种基于其他类别和同一类别的实体以及(ii)及其相应WordNet HyperNyms的名词的三种增强方法。我们表明,通过我们的方法增强培训数据可以提高摘要的事实正确性,而不会显着影响胭脂评分。我们表明,在两个常用的摘要数据集(CNN/dailymail和Xsum)中,我们平均将事实正确性提高约2.5点
Abstractive summarization systems based on pretrained language models often generate coherent but factually inconsistent sentences. In this paper, we present a counterfactual data augmentation approach where we augment data with perturbed summaries that increase the training data diversity. Specifically, we present three augmentation approaches based on replacing (i) entities from other and the same category and (ii) nouns with their corresponding WordNet hypernyms. We show that augmenting the training data with our approach improves the factual correctness of summaries without significantly affecting the ROUGE score. We show that in two commonly used summarization datasets (CNN/Dailymail and XSum), we improve the factual correctness by about 2.5 points on average