论文标题
具有矢量损失的随机匪徒:最小化$ \ ell^\ infty $ - 相对损失
Stochastic Bandits with Vector Losses: Minimizing $\ell^\infty$-Norm of Relative Losses
论文作者
论文摘要
多武器的土匪被广泛应用于建议系统等方案,其目标是最大化点击率。但是,应该考虑更多因素,例如用户粘性,用户增长率,用户体验评估等。在本文中,我们将这种情况建模为$ k $ armed the the $ k $ armed bandit,并具有多种损失。我们定义了$ i $ th条目的相对损失向量,其中比较了$ i $ $ th损失的臂和最佳臂。我们研究两个目标:(a)找到具有最低$ \ ell^\ infty $ norm的相对损失的臂,并具有给定的置信度水平(这是指固定信心最佳臂识别); (b)最大程度地减少累积相对损失的$ \ ell^\ infty $ norm(这是遗憾的最小化)。对于目标(a),我们得出了问题依赖性样本复杂性下限,并讨论如何实现匹配算法。对于目标(b),我们提供了$ω(t^{2/3})$的遗憾下限,并提供匹配的算法。
Multi-armed bandits are widely applied in scenarios like recommender systems, for which the goal is to maximize the click rate. However, more factors should be considered, e.g., user stickiness, user growth rate, user experience assessment, etc. In this paper, we model this situation as a problem of $K$-armed bandit with multiple losses. We define relative loss vector of an arm where the $i$-th entry compares the arm and the optimal arm with respect to the $i$-th loss. We study two goals: (a) finding the arm with the minimum $\ell^\infty$-norm of relative losses with a given confidence level (which refers to fixed-confidence best-arm identification); (b) minimizing the $\ell^\infty$-norm of cumulative relative losses (which refers to regret minimization). For goal (a), we derive a problem-dependent sample complexity lower bound and discuss how to achieve matching algorithms. For goal (b), we provide a regret lower bound of $Ω(T^{2/3})$ and provide a matching algorithm.