具有不可分割奖励功能的石板匪算法

论文标题

具有不可分割奖励功能的石板匪算法

Algorithms for slate bandits with non-separable reward functions

论文作者

Rhuggenaath, Jason, Akcay, Alp, Zhang, Yingqian, Kaymak, Uzay

论文摘要

在本文中，我们研究了一个板岩匪徒问题，其中确定板岩级奖励的函数是不可分割的：该函数的最佳值无法通过学习每个插槽的最佳动作来确定。我们主要关注的是，相对于时间范围的板岩数量较大，因此在传统的多武器强盗中尝试每个板岩作为单独的臂是不可行的。我们的主要贡献是算法的设计，尽管板岩数量大量，但对于时间范围仍具有次线性遗憾。模拟数据和现实世界数据的实验结果表明，我们所提出的方法优于流行的基准强盗算法。

In this paper, we study a slate bandit problem where the function that determines the slate-level reward is non-separable: the optimal value of the function cannot be determined by learning the optimal action for each slot. We are mainly concerned with cases where the number of slates is large relative to the time horizon, so that trying each slate as a separate arm in a traditional multi-armed bandit, would not be feasible. Our main contribution is the design of algorithms that still have sub-linear regret with respect to the time horizon, despite the large number of slates. Experimental results on simulated data and real-world data show that our proposed method outperforms popular benchmark bandit algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题