通过用户仿真评估混合定位的对话搜索系统

论文标题

通过用户仿真评估混合定位的对话搜索系统

Evaluating Mixed-initiative Conversational Search Systems via User Simulation

论文作者

Sekulić, Ivan, Aliannejadi, Mohammad, Crestani, Fabio

论文摘要

通过询问澄清问题来阐明潜在的用户信息需求是现代对话搜索系统的重要特征。但是，通过回答来评估此类系统，澄清问题需要大量的人类努力，这可能是耗时且昂贵的。在本文中，我们提出了一个称为USI的对话用户模拟器，以自动评估此类对话搜索系统。鉴于对信息需求的描述，USI能够在整个搜索过程中自动回答有关该主题的澄清问题。通过一系列实验，包括自动的自然语言产生指标和众包研究，我们表明，USI产生的答案都与潜在的信息需求并与人类生成的答案相媲美。此外，我们迈出了多转交互的第一步，其中对话搜索系统向（模拟）用户提出了多个问题，目的是阐明用户需求。为此，我们通过执行基于众包的多转移数据获取来扩展目前可用的数据集，以研究澄清问题，即Qulac和Clariq。我们表明，我们的生成性，基于GPT2的模型能够在单转设置中提供准确和自然的答案，以在单转设置中看不见澄清的问题，并在多转弯设置中讨论我们模型的功能。我们提供代码，数据和预先培训的模型，用于对该主题的进一步研究。

Clarifying the underlying user information need by asking clarifying questions is an important feature of modern conversational search system. However, evaluation of such systems through answering prompted clarifying questions requires significant human effort, which can be time-consuming and expensive. In this paper, we propose a conversational User Simulator, called USi, for automatic evaluation of such conversational search systems. Given a description of an information need, USi is capable of automatically answering clarifying questions about the topic throughout the search session. Through a set of experiments, including automated natural language generation metrics and crowdsourcing studies, we show that responses generated by USi are both inline with the underlying information need and comparable to human-generated answers. Moreover, we make the first steps towards multi-turn interactions, where conversational search systems asks multiple questions to the (simulated) user with a goal of clarifying the user need. To this end, we expand on currently available datasets for studying clarifying questions, i.e., Qulac and ClariQ, by performing a crowdsourcing-based multi-turn data acquisition. We show that our generative, GPT2-based model, is capable of providing accurate and natural answers to unseen clarifying questions in the single-turn setting and discuss capabilities of our model in the multi-turn setting. We provide the code, data, and the pre-trained model to be used for further research on the topic.

下载PDF全文

下载文献需遵守相关版权规定

论文标题