论文标题
在上下文匪徒中进行反事实评估的方差 - 最佳扩增记录
Variance-Optimal Augmentation Logging for Counterfactual Evaluation in Contextual Bandits
论文作者
论文摘要
离线A/B测试和反事实学习的方法正在搜索和推荐系统中快速采用,因为它们可以有效地重复使用现有日志数据。但是,仅使用现有的日志数据是有基本的限制,因为当这些方法中通常使用的反事实估计器可能会有较大的偏见和较大的差异,而当记录策略与所评估的目标策略大不相同。为了克服这一限制,我们探讨了如何设计数据收集政策的问题,这些问题最有效地增强了匪徒反馈的现有数据集,并进行了其他观察,以进行学习和评估。为此,本文介绍了最小差异增加记录(MVAL),这是一种构建记录策略的方法,以最大程度地减少下游评估或学习问题的方差。我们有效地探讨了计算MVAL策略的多种方法,并发现它们在降低估计量的方差中比幼稚的方法更有效。
Methods for offline A/B testing and counterfactual learning are seeing rapid adoption in search and recommender systems, since they allow efficient reuse of existing log data. However, there are fundamental limits to using existing log data alone, since the counterfactual estimators that are commonly used in these methods can have large bias and large variance when the logging policy is very different from the target policy being evaluated. To overcome this limitation, we explore the question of how to design data-gathering policies that most effectively augment an existing dataset of bandit feedback with additional observations for both learning and evaluation. To this effect, this paper introduces Minimum Variance Augmentation Logging (MVAL), a method for constructing logging policies that minimize the variance of the downstream evaluation or learning problem. We explore multiple approaches to computing MVAL policies efficiently, and find that they can be substantially more effective in decreasing the variance of an estimator than naïve approaches.