论文标题

USR:无监督和参考的免费评估指标

USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation

论文作者

Mehri, Shikib, Eskenazi, Maxine

论文摘要

对话框缺乏有意义的自动评估指标阻碍了开放域对话研究。标准语言产生指标已被证明无法评估对话模型。为此,本文介绍了USR,这是对话框的无监督和无参考评估指标。 USR是一个无参考的指标,它训练无监督的模型来衡量几种理想的对话品质。 USR显示与人类对局部聊天(转向级:0.42,系统级别:1.0)和人为ACHAT(转交级:0.48和系统级别:1.0)的判断密切相关。 USR还为对话的几种理想属性产生了可解释的措施。

The lack of meaningful automatic evaluation metrics for dialog has impeded open-domain dialog research. Standard language generation metrics have been shown to be ineffective for evaluating dialog models. To this end, this paper presents USR, an UnSupervised and Reference-free evaluation metric for dialog. USR is a reference-free metric that trains unsupervised models to measure several desirable qualities of dialog. USR is shown to strongly correlate with human judgment on both Topical-Chat (turn-level: 0.42, system-level: 1.0) and PersonaChat (turn-level: 0.48 and system-level: 1.0). USR additionally produces interpretable measures for several desirable properties of dialog.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源