论文标题
挖掘不是忠实的:对提取性总结中广泛的不忠实问题的调查
Extractive is not Faithful: An Investigation of Broad Unfaithfulness Problems in Extractive Summarization
论文作者
论文摘要
在抽象性摘要的背景下,已广泛讨论了不忠摘要的问题。尽管提取性摘要不太容易出现抽象性摘要的普遍不忠问题,但这是否意味着提取性等于忠实?原来答案是否定的。在这项工作中,我们定义了一种具有五种类型的不忠实问题(包括和超越未登录)的类型学,这些问题可能会出现在提取性摘要中,包括不正确的核心,不完整的核心,不正确的话语,不完整的话语,不完整的话语以及其他误导性信息。我们要求人类将1600个英语摘要中的这些问题标记为16种不同的提取系统。我们发现,其中30%的摘要至少有五个问题之一。为了自动检测这些问题,我们发现5个现有的忠诚评估指标与人类判断力的相关性很差。为了解决这个问题,我们提出了一种新的指标,旨在检测不忠的提取性摘要,并被证明具有最佳性能。我们希望我们的工作能够提高对提取性总结中不忠问题的认识,并帮助将来的工作评估和解决这些问题。我们的数据和代码可在https://github.com/zhangshiyue/extractive_is_not_faithful上公开获取
The problems of unfaithful summaries have been widely discussed under the context of abstractive summarization. Though extractive summarization is less prone to the common unfaithfulness issues of abstractive summaries, does that mean extractive is equal to faithful? Turns out that the answer is no. In this work, we define a typology with five types of broad unfaithfulness problems (including and beyond not-entailment) that can appear in extractive summaries, including incorrect coreference, incomplete coreference, incorrect discourse, incomplete discourse, as well as other misleading information. We ask humans to label these problems out of 1600 English summaries produced by 16 diverse extractive systems. We find that 30% of the summaries have at least one of the five issues. To automatically detect these problems, we find that 5 existing faithfulness evaluation metrics for summarization have poor correlations with human judgment. To remedy this, we propose a new metric, ExtEval, that is designed for detecting unfaithful extractive summaries and is shown to have the best performance. We hope our work can increase the awareness of unfaithfulness problems in extractive summarization and help future work to evaluate and resolve these issues. Our data and code are publicly available at https://github.com/ZhangShiyue/extractive_is_not_faithful