策略意识无偏学习以排名最高的排名

论文标题

策略意识无偏学习以排名最高的排名

Policy-Aware Unbiased Learning to Rank for Top-k Rankings

论文作者

Oosterhuis, Harrie, de Rijke, Maarten

论文摘要

反事实学习排名（LTR）方法使用包含相互作用偏见的记录用户交互优化排名系统。仅当每个排名中的所有相关项目呈现用户时，现有方法才能公正。当前没有现有的反事实无偏LTR方法可用于TOP-K排名。我们介绍了一种新颖的LTR指标的策略感知估计器，可以解释随机记录策略的影响。我们证明，如果每个相关项目都有非零的概率出现在TOP-K排名中，则策略感知估计器是公正的。我们的实验结果表明，我们的估计器的性能不受K的大小影响：对于任何K，策略感知的估计器在从TOP-K反馈中学习与从全部排名中学习的回报相同。最后，我们介绍了传统LTR方法的新型扩展，以执行反事实LTR并优化TOP-K指标。我们的贡献共同介绍了第一种政策意识无偏的LTR方法，该方法从TOP-K反馈中学习并优化了TOP-K指标。结果，反事实LTR现在适用于搜索和建议中非常普遍的TOP-K排名设置。

Counterfactual Learning to Rank (LTR) methods optimize ranking systems using logged user interactions that contain interaction biases. Existing methods are only unbiased if users are presented with all relevant items in every ranking. There is currently no existing counterfactual unbiased LTR method for top-k rankings. We introduce a novel policy-aware counterfactual estimator for LTR metrics that can account for the effect of a stochastic logging policy. We prove that the policy-aware estimator is unbiased if every relevant item has a non-zero probability to appear in the top-k ranking. Our experimental results show that the performance of our estimator is not affected by the size of k: for any k, the policy-aware estimator reaches the same retrieval performance while learning from top-k feedback as when learning from feedback on the full ranking. Lastly, we introduce novel extensions of traditional LTR methods to perform counterfactual LTR and to optimize top-k metrics. Together, our contributions introduce the first policy-aware unbiased LTR approach that learns from top-k feedback and optimizes top-k metrics. As a result, counterfactual LTR is now applicable to the very prevalent top-k ranking setting in search and recommendation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题