通过用户模拟评估对话推荐系统

论文标题

通过用户模拟评估对话推荐系统

Evaluating Conversational Recommender Systems via User Simulation

论文作者

Zhang, Shuo, Balog, Krisztian

论文摘要

会话信息访问是一个新兴研究领域。当前，人类评估用于端到端系统评估，这既是大规模的时间和资源密集程度，因此成为进步的瓶颈。作为替代方案，我们通过模拟用户提出自动评估。我们的用户模拟器旨在通过考虑个人偏好和与系统的一般交互作用来产生真正的人会提供的响应。我们通过比较三个现有的对话推荐系统来评估项目推荐任务的模拟方法。我们表明，偏好建模和特定于任务的相互作用模型既有助于更现实的模拟，又可以帮助实现自动评估措施和手动人工评估之间的高相关性。

Conversational information access is an emerging research area. Currently, human evaluation is used for end-to-end system evaluation, which is both very time and resource intensive at scale, and thus becomes a bottleneck of progress. As an alternative, we propose automated evaluation by means of simulating users. Our user simulator aims to generate responses that a real human would give by considering both individual preferences and the general flow of interaction with the system. We evaluate our simulation approach on an item recommendation task by comparing three existing conversational recommender systems. We show that preference modeling and task-specific interaction models both contribute to more realistic simulations, and can help achieve high correlation between automatic evaluation measures and manual human assessments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题