一个大规模的搜索数据集，用于排名

论文标题

一个大规模的搜索数据集，用于排名

A Large Scale Search Dataset for Unbiased Learning to Rank

论文作者

Zou, Lixin, Mao, Haitao, Chu, Xiaokai, Tang, Jiliang, Ye, Wenwen, Wang, Shuaiqiang, Yin, Dawei

论文摘要

最近的深度学习技术和精心设计的DEBIA算法，公正的排名学习（ULTR）问题已经大大提高了。但是，由于从这些流行的基准测试数据集中观察到的以下缺点，现有基准数据集上的有希望的结果可能不会扩展到实际场景：（1）过时的语义特征提取提取过时的语义特征提取，其中最先进的大型预培训的语言像伯特（Bert）这样的伯特（Bert）（如伯特（Bert））无法删除原始文本的摘要；（2）摘要的摘要；用于分析点击必要的偏见；（3）缺乏现实世界的用户反馈，导致经验研究中合成数据集的普遍性。为了克服上述缺点，我们介绍了Baidu-ultr数据集。它涉及随机采样12亿次搜索会议和7,008个专家注释的查询，该查询比现有的数量级大。 Baidu-ultr提供：（1）原始的语义功能和一个预先训练的语言模型，可轻松使用；（2）足够的显示信息，例如位置，显示高度并显示了抽象，从而可以全面研究具有先进技术的不同偏见，例如因果发现和元学习；（3）搜索结果页面（SERP）等丰富的用户反馈，例如住宅时间，允许用户参与优化并促进ULTR中多任务学习的探索。在本文中，我们介绍了Baidu-Ultr的设计原理以及在此新数据资源上基准超级算法的性能，有利于探索长尾查询和排名预培训任务的排名。 BAIDU-ULTR数据集和相应的基线实现可在https://github.com/chuxiaokai/baidu_ultr_dataset上获得。

The unbiased learning to rank (ULTR) problem has been greatly advanced by recent deep learning techniques and well-designed debias algorithms. However, promising results on the existing benchmark datasets may not be extended to the practical scenario due to the following disadvantages observed from those popular benchmark datasets: (1) outdated semantic feature extraction where state-of-the-art large scale pre-trained language models like BERT cannot be exploited due to the missing of the original text;(2) incomplete display features for in-depth study of ULTR, e.g., missing the displayed abstract of documents for analyzing the click necessary bias; (3) lacking real-world user feedback, leading to the prevalence of synthetic datasets in the empirical study. To overcome the above disadvantages, we introduce the Baidu-ULTR dataset. It involves randomly sampled 1.2 billion searching sessions and 7,008 expert annotated queries, which is orders of magnitude larger than the existing ones. Baidu-ULTR provides:(1) the original semantic feature and a pre-trained language model for easy usage; (2) sufficient display information such as position, displayed height, and displayed abstract, enabling the comprehensive study of different biases with advanced techniques such as causal discovery and meta-learning; and (3) rich user feedback on search result pages (SERPs) like dwelling time, allowing for user engagement optimization and promoting the exploration of multi-task learning in ULTR. In this paper, we present the design principle of Baidu-ULTR and the performance of benchmark ULTR algorithms on this new data resource, favoring the exploration of ranking for long-tail queries and pre-training tasks for ranking. The Baidu-ULTR dataset and corresponding baseline implementation are available at https://github.com/ChuXiaokai/baidu_ultr_dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题