论文标题

关于所有$ \ varepsilon $ best Arms标识的复杂性

On the complexity of All $\varepsilon$-Best Arms Identification

论文作者

Marjani, Aymen Al, Kocák, Tomáš, Garivier, Aurélien

论文摘要

我们考虑\ cite {mason2020}引入的问题,该问题是在有限的随机多臂强盗和高斯奖励中识别所有$ \ varepsilon $ - 最佳的武器。我们在任何算法的样本复杂性上给出了两个下限,以至少$ 1-δ$解决问题。第一个在渐近制度中不可解决的是,当风险$δ$变为零时,其平均样品复杂性在渐近上是最佳的,它的平均样品复杂性在零上是最佳的。值得注意的是,我们提供了一种有效的数值方法来求解出现在下限中的凸最大程序。我们的方法基于对最佳采样策略需要排除的替代匪徒实例的完整表征,从而使我们的界限比\ cite {mason2020}提供的界限更紧密。第二个下限涉及风险$δ$的高和中等值的状态,并表征了初始阶段任何算法的行为。它强调了武器数量中样品复杂性的线性依赖性。最后,我们报告了数值模拟,证明了我们的算法比最新方法的优势,即使是适度的风险也是如此。

We consider the question introduced by \cite{Mason2020} of identifying all the $\varepsilon$-optimal arms in a finite stochastic multi-armed bandit with Gaussian rewards. We give two lower bounds on the sample complexity of any algorithm solving the problem with a confidence at least $1-δ$. The first, unimprovable in the asymptotic regime, motivates the design of a Track-and-Stop strategy whose average sample complexity is asymptotically optimal when the risk $δ$ goes to zero. Notably, we provide an efficient numerical method to solve the convex max-min program that appears in the lower bound. Our method is based on a complete characterization of the alternative bandit instances that the optimal sampling strategy needs to rule out, thus making our bound tighter than the one provided by \cite{Mason2020}. The second lower bound deals with the regime of high and moderate values of the risk $δ$, and characterizes the behavior of any algorithm in the initial phase. It emphasizes the linear dependency of the sample complexity in the number of arms. Finally, we report on numerical simulations demonstrating our algorithm's advantage over state-of-the-art methods, even for moderate risks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源