USR：无监督和参考的免费评估指标

论文标题

USR：无监督和参考的免费评估指标

USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation

论文作者

Mehri, Shikib, Eskenazi, Maxine

论文摘要

对话框缺乏有意义的自动评估指标阻碍了开放域对话研究。标准语言产生指标已被证明无法评估对话模型。为此，本文介绍了USR，这是对话框的无监督和无参考评估指标。 USR是一个无参考的指标，它训练无监督的模型来衡量几种理想的对话品质。 USR显示与人类对局部聊天（转向级：0.42，系统级别：1.0）和人为ACHAT（转交级：0.48和系统级别：1.0）的判断密切相关。 USR还为对话的几种理想属性产生了可解释的措施。

The lack of meaningful automatic evaluation metrics for dialog has impeded open-domain dialog research. Standard language generation metrics have been shown to be ineffective for evaluating dialog models. To this end, this paper presents USR, an UnSupervised and Reference-free evaluation metric for dialog. USR is a reference-free metric that trains unsupervised models to measure several desirable qualities of dialog. USR is shown to strongly correlate with human judgment on both Topical-Chat (turn-level: 0.42, system-level: 1.0) and PersonaChat (turn-level: 0.48 and system-level: 1.0). USR additionally produces interpretable measures for several desirable properties of dialog.

下载PDF全文

下载文献需遵守相关版权规定

论文标题