论文标题

迈向隐私研究的查询日志:从问题中得出搜索查询

Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions

论文作者

Biega, Asia J., Schmidt, Jana, Roy, Rishiraj Saha

论文摘要

将详细信息的需求转化为清晰的搜索查询是一种无处不在但几乎没有理解的现象。对此过程的见解可能在几种应用程序中很有价值,包括从公共网络来源综合了大型隐私友好的查询日志,这些查询可容易为学术研究社区提供。在这项工作中,我们通过利用社区问答(CQA)论坛的丰富潜力来理解查询表述。具体而言,我们采样了跨越堆栈交换平台的不同主题的自然语言(NL)问题,并进行了大规模的转换实验,在寻找等效信息时,人群工人提交他们将使用的搜索查询。我们对这些数据进行了仔细的分析,考虑了转换过程中可能的偏差来源,以及对用户特定的语言模式和搜索行为的见解。我们从这项研究中发布了7,000对问题 - 问题对的数据集,以促进有关查询理解的进一步研究。

Translating verbose information needs into crisp search queries is a phenomenon that is ubiquitous but hardly understood. Insights into this process could be valuable in several applications, including synthesizing large privacy-friendly query logs from public Web sources which are readily available to the academic research community. In this work, we take a step towards understanding query formulation by tapping into the rich potential of community question answering (CQA) forums. Specifically, we sample natural language (NL) questions spanning diverse themes from the Stack Exchange platform, and conduct a large-scale conversion experiment where crowdworkers submit search queries they would use when looking for equivalent information. We provide a careful analysis of this data, accounting for possible sources of bias during conversion, along with insights into user-specific linguistic patterns and search behaviors. We release a dataset of 7,000 question-query pairs from this study to facilitate further research on query understanding.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源