重新审视简单的遗憾：返回好手臂的快速费率

论文标题

重新审视简单的遗憾：返回好手臂的快速费率

Revisiting Simple Regret: Fast Rates for Returning a Good Arm

论文作者

Zhao, Yao, Stephens, Connor James, Szepesvári, Csaba, Jun, Kwang-Sung

论文摘要

简单的遗憾是在多军匪徒中纯粹探索的自然且无参数的性能标准，但不如缺少最好的手臂或$ε$好的手臂的可能性流行，这可能是由于缺乏简单的方式来表征它。在本文中，我们在最小化数据丰富（$ t \ ge n $）和数据贫困制度（$ t \ le n $）的简单遗憾方面取得了重大进展，其中$ n $是武器的数量，而$ t $是样本的数量。它的核心是我们对实例依赖的依赖性分析（SH）算法的改进，我们绑定了返回一只手臂的可能性，该手臂的平均奖励不超出$ε$在最佳（即$ε$ -good）中\ textit {any}的{any} abor $ε> 0 $，尽管$ε> 0 $，但$ε$不是$ε$。我们的界限不仅会导致最佳的最坏情况简单的遗憾，$ \ sqrt {n/t} $达到对数因素，而且从本质上讲，与实例相关的下限匹配了返回$ε$ good-good arm，由katz-samuels和jamieson报告（2020）。对于更具挑战性的数据贫困制度，我们提出了括号SH（BSH），即使在没有至少一次对每个手臂进行采样的情况下，它也可以进行相同的改进。我们的实证研究表明，BSH在现实世界任务上的现有方法优于现有方法。

Simple regret is a natural and parameter-free performance criterion for pure exploration in multi-armed bandits yet is less popular than the probability of missing the best arm or an $ε$-good arm, perhaps due to lack of easy ways to characterize it. In this paper, we make significant progress on minimizing simple regret in both data-rich ($T\ge n$) and data-poor regime ($T \le n$) where $n$ is the number of arms, and $T$ is the number of samples. At its heart is our improved instance-dependent analysis of the well-known Sequential Halving (SH) algorithm, where we bound the probability of returning an arm whose mean reward is not within $ε$ from the best (i.e., not $ε$-good) for \textit{any} choice of $ε>0$, although $ε$ is not an input to SH. Our bound not only leads to an optimal worst-case simple regret bound of $\sqrt{n/T}$ up to logarithmic factors but also essentially matches the instance-dependent lower bound for returning an $ε$-good arm reported by Katz-Samuels and Jamieson (2020). For the more challenging data-poor regime, we propose Bracketing SH (BSH) that enjoys the same improvement even without sampling each arm at least once. Our empirical study shows that BSH outperforms existing methods on real-world tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题