论文标题
处理最佳武器标识中未知差异
Dealing with Unknown Variances in Best-Arm Identification
论文作者
论文摘要
当已知差异时,可以很好地理解具有高斯奖励分布的一系列物品中最佳手臂的问题。尽管它与许多应用程序具有实际相关性,但很少有作品将其研究用于未知的差异。在本文中,我们通过插入经验差异或调整运输成本来介绍和分析两种处理未知方差的方法。为了校准我们的两个停止规则,我们得出了具有独立关注的新的时均匀浓度不平等。然后,我们说明了两个采样规则包装器在轨道上和前两个算法上的理论和经验表现。此外,通过量化对不知道方差的样本复杂性的影响,我们揭示了它很小。
The problem of identifying the best arm among a collection of items having Gaussian rewards distribution is well understood when the variances are known. Despite its practical relevance for many applications, few works studied it for unknown variances. In this paper we introduce and analyze two approaches to deal with unknown variances, either by plugging in the empirical variance or by adapting the transportation costs. In order to calibrate our two stopping rules, we derive new time-uniform concentration inequalities, which are of independent interest. Then, we illustrate the theoretical and empirical performances of our two sampling rule wrappers on Track-and-Stop and on a Top Two algorithm. Moreover, by quantifying the impact on the sample complexity of not knowing the variances, we reveal that it is rather small.