学习扩展：伪造的相关反馈选择，用于寻求信息的对话

论文标题

学习扩展：伪造的相关反馈选择，用于寻求信息的对话

Learning to Expand: Reinforced Pseudo-relevance Feedback Selection for Information-seeking Conversations

论文作者

Pan, Haojie, Chen, Cen, Wang, Chengyu, Qiu, Minghui, Yang, Liu, Ji, Feng, Huang, Jun

论文摘要

寻求信息的对话系统在现实世界中越来越受欢迎，尤其是对于电子商务公司而言。为了检索用户的适当响应，有必要通过历史对话说法来计算候选响应与用户查询之间的匹配程度。由于上下文通常比响应更长，因此有必要使用更丰富的信息扩展响应（通常是简短）。关于伪相关反馈（PRF）的最新研究证明了其在搜索引擎的查询扩展方面的有效性，因此我们考虑使用PRF信息来扩展响应。但是，现有的PRF方法要么基于启发式规则，要么需要大量的手动标签，这不适合解决我们的任务。为了减轻这个问题，我们将PRF选择视为一项学习任务，并提出了一种增强的学习方法，该方法可以以端到端的方式进行培训，而无需任何人类注释。更具体地说，我们提出了一个增强的选择器来提取有用的PRF术语，以增强响应候选者和基于BERT的响应等级器，以对PRF增强响应进行排名。排名者的性能是指导选择者提取有用的PRF术语的奖励，从而提高整体任务绩效。与其他潜在的软选择方法相比，对标准基准和商业数据集的广泛实验证明了我们增强的PRF项选择器的优越性。案例研究和定量分析都表明，我们的模型能够选择有意义的PRF术语以扩大响应候选者，并与各种评估指标上的所有基线相比，也可以取得最佳结果。我们还在一家电子商务公司中部署了我们的在线生产方法，该公司对现有的在线排名系统有了显着改善。

Information-seeking conversation systems are increasingly popular in real-world applications, especially for e-commerce companies. To retrieve appropriate responses for users, it is necessary to compute the matching degrees between candidate responses and users' queries with historical dialogue utterances. As the contexts are usually much longer than responses, it is thus necessary to expand the responses (usually short) with richer information. Recent studies on pseudo-relevance feedback (PRF) have demonstrated its effectiveness in query expansion for search engines, hence we consider expanding response using PRF information. However, existing PRF approaches are either based on heuristic rules or require heavy manual labeling, which are not suitable for solving our task. To alleviate this problem, we treat the PRF selection for response expansion as a learning task and propose a reinforced learning method that can be trained in an end-to-end manner without any human annotations. More specifically, we propose a reinforced selector to extract useful PRF terms to enhance response candidates and a BERT-based response ranker to rank the PRF-enhanced responses. The performance of the ranker serves as a reward to guide the selector to extract useful PRF terms, which boosts the overall task performance. Extensive experiments on both standard benchmarks and commercial datasets prove the superiority of our reinforced PRF term selector compared with other potential soft or hard selection methods. Both case studies and quantitative analysis show that our model is capable of selecting meaningful PRF terms to expand response candidates and also achieving the best results compared with all baselines on a variety of evaluation metrics. We have also deployed our method on online production in an e-commerce company, which shows a significant improvement over the existing online ranking system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题