去图：摘要中对事实的元评估

论文标题

去图：摘要中对事实的元评估

GO FIGURE: A Meta Evaluation of Factuality in Summarization

论文作者

Gabriel, Saadia, Celikyilmaz, Asli, Jha, Rahul, Choi, Yejin, Gao, Jianfeng

论文摘要

虽然神经语言模型可以以显着的流利性和连贯性生成文本，但控制一代人的事实正确性仍然是一个开放的研究问题。表面流利度与神经产生的内容级正确性之间的这种主要差异激发了一项新的研究线，该研究寻求自动指标来评估机器文本的事实。在本文中，我们介绍了GO Tigue，这是一个用于评估事实评估指标的元评估框架。我们提出了五种必要和直观的条件，以评估三个不同摘要任务的诊断事实数据的事实指标。我们对十个事实指标的基准分析表明，我们的元评估框架提供了强大而有效的评估，该评估可扩展到包括QA指标在内的多种类型的事实一致性和标准生成指标。它还表明，尽管质量检查指标通常比衡量跨领域的事实的标准指标有所改善，但绩效高度取决于产生问题的方式。

While neural language models can generate text with remarkable fluency and coherence, controlling for factual correctness in generation remains an open research question. This major discrepancy between the surface-level fluency and the content-level correctness of neural generation has motivated a new line of research that seeks automatic metrics for evaluating the factuality of machine text. In this paper, we introduce GO FIGURE, a meta-evaluation framework for evaluating factuality evaluation metrics. We propose five necessary and intuitive conditions to evaluate factuality metrics on diagnostic factuality data across three different summarization tasks. Our benchmark analysis on ten factuality metrics reveals that our meta-evaluation framework provides a robust and efficient evaluation that is extensible to multiple types of factual consistency and standard generation metrics, including QA metrics. It also reveals that while QA metrics generally improve over standard metrics that measure factuality across domains, performance is highly dependent on the way in which questions are generated.

下载PDF全文

下载文献需遵守相关版权规定

论文标题