从用户行为中挖掘隐性相关反馈，以回答网络问题

论文标题

从用户行为中挖掘隐性相关反馈，以回答网络问题

Mining Implicit Relevance Feedback from User Behavior for Web Question Answering

论文作者

Shou, Linjun, Bo, Shining, Cheng, Feixiang, Gong, Ming, Pei, Jian, Jiang, Daxin

论文摘要

培训和刷新的网络规模问答（QA）系统为多语言商业搜索引擎通常需要大量的培训示例。一个有原则的想法是从搜索引擎日志中记录的用户行为中挖掘隐性相关反馈。以前的所有研究都在挖掘隐式相关反馈目标上，以网络文档的相关性而不是段落。由于质量保证任务的几个独特特征，因此无法应用Web文档的现有用户行为模型来推断段落相关性。在本文中，我们进行了第一个研究，以探讨用户行为与通过相关性之间的相关性，并提出了一种用于挖掘Web QA培训数据的新方法。我们在四个测试数据集上进行了广泛的实验，结果表明我们的方法显着提高了通道排名的准确性，而没有额外的人类标记的数据。实际上，这项工作已被证明有效地大大降低了全球商业搜索引擎中质量保证服务的人体标签成本，尤其是对于资源低的语言而言。我们的技术已部署在多语言服务中。

Training and refreshing a web-scale Question Answering (QA) system for a multi-lingual commercial search engine often requires a huge amount of training examples. One principled idea is to mine implicit relevance feedback from user behavior recorded in search engine logs. All previous works on mining implicit relevance feedback target at relevance of web documents rather than passages. Due to several unique characteristics of QA tasks, the existing user behavior models for web documents cannot be applied to infer passage relevance. In this paper, we make the first study to explore the correlation between user behavior and passage relevance, and propose a novel approach for mining training data for Web QA. We conduct extensive experiments on four test datasets and the results show our approach significantly improves the accuracy of passage ranking without extra human labeled data. In practice, this work has proved effective to substantially reduce the human labeling cost for the QA service in a global commercial search engine, especially for languages with low resources. Our techniques have been deployed in multi-language services.

下载PDF全文

下载文献需遵守相关版权规定

论文标题