论文标题

分析和评估对话摘要中的忠诚

Analyzing and Evaluating Faithfulness in Dialogue Summarization

论文作者

Wang, Bin, Zhang, Chen, Zhang, Yan, Chen, Yiming, Li, Haizhou

论文摘要

对话摘要本质上是抽象的,使其遭受了事实错误。摘要的事实正确性在实际申请之前的优先级最高。已经做出了许多努力来提高文本总结的忠诚。但是,缺乏对对话摘要系统的系统研究。在这项工作中,我们首先对对话摘要的忠诚进行了细粒度的人类分析,并观察到超过35%的生成摘要忠实地毫无疑问,源对话不一致。此外,我们提出了一种新的模型级忠实评估方法。它通过基于规则的转换创建的多项选择问题来检查生成模型。实验结果表明,我们的评估模式是汇总模型的事实正确性的强大替代。释放人类宣传的忠诚样本和评估工具包,以促进未来的研究朝着忠实的对话摘要。

Dialogue summarization is abstractive in nature, making it suffer from factual errors. The factual correctness of summaries has the highest priority before practical applications. Many efforts have been made to improve faithfulness in text summarization. However, there is a lack of systematic study on dialogue summarization systems. In this work, we first perform the fine-grained human analysis on the faithfulness of dialogue summaries and observe that over 35% of generated summaries are faithfully inconsistent respective the source dialogues. Furthermore, we present a new model-level faithfulness evaluation method. It examines generation models with multi-choice questions created by rule-based transformations. Experimental results show that our evaluation schema is a strong proxy for the factual correctness of summarization models. The human-annotated faithfulness samples and the evaluation toolkit are released to facilitate future research toward faithful dialogue summarization.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源