在级联行为模型下的排名政策的双重核心分支评估

论文标题

在级联行为模型下的排名政策的双重核心分支评估

Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model

论文作者

Kiyohara, Haruka, Saito, Yuta, Matsuhiro, Tatsuya, Narita, Yusuke, Shimizu, Nobuyuki, Yamamoto, Yasuo

论文摘要

在现实世界中的推荐系统和搜索引擎中，优化排名决策以提出相关项目的排名列表至关重要。因此，用于排名策略的政策评估（OPE）因此引起了人们的兴趣，因为它可以仅使用记录数据对新的排名策略进行绩效估算。尽管对上下文匪徒的OPE进行了广泛的研究，但由于巨大的项目空间，其对排名设置的幼稚应用面临着一个关键的差异问题。为了解决这个问题，先前的研究介绍了一些关于用户行为的假设，以使组合物品空间可处理。但是，不切实际的假设可能会导致严重的偏见。因此，通过合理的假设来适当控制偏见变化权衡是成功排名政策成功的关键。为了实现均衡平衡的偏差差异，我们在级联假设上提出了喀斯喀特双重稳健的估计器构建，该假设假设用户在排名中的最高位置依次与项目进行交互。我们表明，与现有更强假设的估计器相比，拟议的估计器在更多情况下是公正的。此外，与基于同一级联假设的先前估计器相比，提出的估计器通过利用控制变异来降低差异。对合成和现实世界数据的全面实验表明，在各种设置中，我们的估计器比现有估计器更准确。

In real-world recommender systems and search engines, optimizing ranking decisions to present a ranked list of relevant items is critical. Off-policy evaluation (OPE) for ranking policies is thus gaining a growing interest because it enables performance estimation of new ranking policies using only logged data. Although OPE in contextual bandits has been studied extensively, its naive application to the ranking setting faces a critical variance issue due to the huge item space. To tackle this problem, previous studies introduce some assumptions on user behavior to make the combinatorial item space tractable. However, an unrealistic assumption may, in turn, cause serious bias. Therefore, appropriately controlling the bias-variance tradeoff by imposing a reasonable assumption is the key for success in OPE of ranking policies. To achieve a well-balanced bias-variance tradeoff, we propose the Cascade Doubly Robust estimator building on the cascade assumption, which assumes that a user interacts with items sequentially from the top position in a ranking. We show that the proposed estimator is unbiased in more cases compared to existing estimators that make stronger assumptions. Furthermore, compared to a previous estimator based on the same cascade assumption, the proposed estimator reduces the variance by leveraging a control variate. Comprehensive experiments on both synthetic and real-world data demonstrate that our estimator leads to more accurate OPE than existing estimators in a variety of settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题