论文标题
您怎么了?:利用用户情感进行自动对话评估
What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation
论文作者
论文摘要
开放域对话框的准确自动评估指标需求很高。现有用于系统响应评估的基于模型的指标对人类注释的数据进行了培训,这很麻烦。在这项工作中,我们建议使用可以自动从下一个用户话语中提取的信息,例如其情感,或者用户是否明确结束对话,以衡量先前系统响应的质量。这使我们能够在无需手动系统转弯质量注释的情况下训练大量的对话框。实验表明,我们的模型与对人类注释数据训练的模型相媲美。此外,我们的模型概括了从真实和付费用户收集的口语和书面开放域对话框中。
Accurate automatic evaluation metrics for open-domain dialogs are in high demand. Existing model-based metrics for system response evaluation are trained on human annotated data, which is cumbersome to collect. In this work, we propose to use information that can be automatically extracted from the next user utterance, such as its sentiment or whether the user explicitly ends the conversation, as a proxy to measure the quality of the previous system response. This allows us to train on a massive set of dialogs with weak supervision, without requiring manual system turn quality annotations. Experiments show that our model is comparable to models trained on human annotated data. Furthermore, our model generalizes across both spoken and written open-domain dialog corpora collected from real and paid users.