论文标题
基于UCB的前两种算法的非反应分析
Non-Asymptotic Analysis of a UCB-based Top Two Algorithm
论文作者
论文摘要
强盗识别的前两个采样规则是一种方法,它选择了从两个候选臂,一个领导者和一个挑战者中进行采样的下一个手臂。由于它们的简单性和良好的经验表现,近年来他们受到了越来越多的关注。但是,对于固定信心最佳臂识别,仅在误差级别消失时,才在渐近方案中获得了前两种方法的理论保证。在本文中,我们在前两个算法的预期样品复杂性上得出了第一个非反应上限,该算法符合任何误差水平。我们的分析强调了足够的属性,以使最小化算法用作领导者。这些属性得到了UCB算法的满足,我们提出的基于UCB的前两种算法同时享有非质子保证和竞争性的经验性能。
A Top Two sampling rule for bandit identification is a method which selects the next arm to sample from among two candidate arms, a leader and a challenger. Due to their simplicity and good empirical performance, they have received increased attention in recent years. However, for fixed-confidence best arm identification, theoretical guarantees for Top Two methods have only been obtained in the asymptotic regime, when the error level vanishes. In this paper, we derive the first non-asymptotic upper bound on the expected sample complexity of a Top Two algorithm, which holds for any error level. Our analysis highlights sufficient properties for a regret minimization algorithm to be used as leader. These properties are satisfied by the UCB algorithm, and our proposed UCB-based Top Two algorithm simultaneously enjoys non-asymptotic guarantees and competitive empirical performance.