论文标题
通过依赖性级别评估一代人的事实
Evaluating Factuality in Generation with Dependency-level Entailment
论文作者
论文摘要
尽管文本生成模型取得了重大进展,但严重的限制是它们倾向于产生与输入中的信息不一致的文本。最近的工作研究了是否可以使用文本组成系统来识别事实错误。但是,对这些句子级的组成模型进行了训练,以解决与一代过滤不同的问题,并且它们不定位一代的哪一部分是非事实的。在本文中,我们提出了一种新的材料表述,将其分解为依赖性弧的水平。我们不是专注于汇总决策,而是询问输入中是否支持生成的输出中的单个依赖性弧体现的语义关系。人类对这项任务的判断很难获得;因此,我们提出了一种基于现有构成或释义语料库自动创建数据的方法。实验表明,我们对这些数据训练的依赖性弧线模型可以比句子级别的方法或基于问题生成的方法更好地识别释义和汇总的事实矛盾,同时还将其本地定位为一代的错误部分。
Despite significant progress in text generation models, a serious limitation is their tendency to produce text that is factually inconsistent with information in the input. Recent work has studied whether textual entailment systems can be used to identify factual errors; however, these sentence-level entailment models are trained to solve a different problem than generation filtering and they do not localize which part of a generation is non-factual. In this paper, we propose a new formulation of entailment that decomposes it at the level of dependency arcs. Rather than focusing on aggregate decisions, we instead ask whether the semantic relationship manifested by individual dependency arcs in the generated output is supported by the input. Human judgments on this task are difficult to obtain; we therefore propose a method to automatically create data based on existing entailment or paraphrase corpora. Experiments show that our dependency arc entailment model trained on this data can identify factual inconsistencies in paraphrasing and summarization better than sentence-level methods or those based on question generation, while additionally localizing the erroneous parts of the generation.