论文标题
反事实学习使用异质治疗效果估计进行排名
Counterfactual Learning to Rank using Heterogeneous Treatment Effect Estimation
论文作者
论文摘要
从隐式反馈(例如点击)训练的学习对级(LTR)模型遭受了固有的偏见。众所周知的是位置偏见 - 最高位置的文件更有可能部分取决于其位置优势。为了公正地学习排名,现有的反事实框架首先估算了来自较小搜索流量的干预数据缺失的倾向(概率),然后将反向倾向分数(IPS)用于整个数据集中的Debias LTR算法。这些方法通常认为倾向仅取决于文档的位置,这可能会导致搜索上下文(例如查询,用户)频繁变化的应用程序中的较高估计差异。尽管与上下文有关的倾向模型降低了方差,但准确的估计可能需要在大量流量上进行随机化或干预,这在现实世界中可能不现实,尤其是对于长尾部查询。在这项工作中,我们采用异构治疗效应估计技术来估计干预点击数据有限时的位置偏差。然后,我们使用此类估计值将观察到的点击分布进行DEBIAS并重新绘制一个新的偏差数据集,该数据集可用于任何LTR算法。我们通过不同的实验条件进行模拟,并在长时间查询和稀疏点击的状态下显示了所提出的方法的有效性。
Learning-to-Rank (LTR) models trained from implicit feedback (e.g. clicks) suffer from inherent biases. A well-known one is the position bias -- documents in top positions are more likely to receive clicks due in part to their position advantages. To unbiasedly learn to rank, existing counterfactual frameworks first estimate the propensity (probability) of missing clicks with intervention data from a small portion of search traffic, and then use inverse propensity score (IPS) to debias LTR algorithms on the whole data set. These approaches often assume the propensity only depends on the position of the document, which may cause high estimation variance in applications where the search context (e.g. query, user) varies frequently. While context-dependent propensity models reduce variance, accurate estimations may require randomization or intervention on a large amount of traffic, which may not be realistic in real-world systems, especially for long tail queries. In this work, we employ heterogeneous treatment effect estimation techniques to estimate position bias when intervention click data is limited. We then use such estimations to debias the observed click distribution and re-draw a new de-biased data set, which can be used for any LTR algorithms. We conduct simulations with varying experiment conditions and show the effectiveness of the proposed method in regimes with long tail queries and sparse clicks.