通过分配鲁棒性改善离线上下文匪徒

论文标题

通过分配鲁棒性改善离线上下文匪徒

Improving Offline Contextual Bandits with Distributional Robustness

论文作者

Sakhi, Otmane, Faury, Louis, Vasile, Flavian

论文摘要

本文扩展了离线上下文匪徒的分布强大优化（DRO）方法。具体而言，我们利用该框架引入了反事实风险最小化原则的凸重新印度。除了依靠凸程序外，我们的方法还与随机优化兼容，因此可以很容易地适应大型数据制度。我们的方法依赖于通过DRO框架来构建离线上下文匪徒的渐近置信区间。通过利用鲁棒估计器的已知渐近结果，我们还展示了如何自动校准此类置信区间，从而消除了超参数选择的负担以进行策略优化。我们提出了支持我们方法有效性的初步经验结果。

This paper extends the Distributionally Robust Optimization (DRO) approach for offline contextual bandits. Specifically, we leverage this framework to introduce a convex reformulation of the Counterfactual Risk Minimization principle. Besides relying on convex programs, our approach is compatible with stochastic optimization, and can therefore be readily adapted tothe large data regime. Our approach relies on the construction of asymptotic confidence intervals for offline contextual bandits through the DRO framework. By leveraging known asymptotic results of robust estimators, we also show how to automatically calibrate such confidence intervals, which in turn removes the burden of hyper-parameter selection for policy optimization. We present preliminary empirical results supporting the effectiveness of our approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题