论文标题
双点校正位置偏见在点击反馈中的双重估计,以进行无偏学习以排名
Doubly-Robust Estimation for Correcting Position-Bias in Click Feedback for Unbiased Learning to Rank
论文作者
论文摘要
单击排名遭受位置偏见的点击:尽管用户之间的实际偏好在项目之间的实际偏好中,但通常会检查较低等级的项目的可能性较小。基于无偏点击的学习对级的普遍方法(LTR)基于反事实逆点数分数(IPS)估计。与一般的增强学习相反,反事实双重运动(DR)估计尚未应用于以前的文献中的基于点击的LTR。在本文中,我们介绍了一种新颖的DR估计器,该估计量是第一种专门为位置偏置设计的DR方法。位置偏见的困难在于,在点击数据中无法直接观察到治疗方法。作为解决方案,我们的估计器使用每等级的预期治疗方法,而不是现有DR估计器使用的实际处理。我们的新型DR估计量比现有的IPS方法具有更强的无偏条件,并且还提供了巨大的方差下降:我们的实验结果表明,以最佳性能收敛的数据点需要少几个数量级。对于公正的LTR领域,我们的DR估计器既可以提高最新性能,又可以提高所有已知LTR估计器的理论保证。
Clicks on rankings suffer from position-bias: generally items on lower ranks are less likely to be examined - and thus clicked - by users, in spite of their actual preferences between items. The prevalent approach to unbiased click-based learning-to-rank (LTR) is based on counterfactual inverse-propensity-scoring (IPS) estimation. In contrast with general reinforcement learning, counterfactual doubly-robust (DR) estimation has not been applied to click-based LTR in previous literature. In this paper, we introduce a novel DR estimator that is the first DR approach specifically designed for position-bias. The difficulty with position-bias is that the treatment - user examination - is not directly observable in click data. As a solution, our estimator uses the expected treatment per rank, instead of the actual treatment that existing DR estimators use. Our novel DR estimator has more robust unbiasedness conditions than the existing IPS approach, and in addition, provides enormous decreases in variance: our experimental results indicate it requires several orders of magnitude fewer datapoints to converge at optimal performance. For the unbiased LTR field, our DR estimator contributes both increases in state-of-the-art performance and the most robust theoretical guarantees of all known LTR estimators.