论文标题
汤普森抽样无监督的顺序选择
Thompson Sampling for Unsupervised Sequential Selection
论文作者
论文摘要
汤普森采样引起了人们对基于置信度结合算法更好的经验性能的重大兴趣。在本文中,我们研究了基于汤普森抽样的算法,以解决无监督的顺序选择(US)问题。 USS问题是随机多臂匪徒问题的变体,在观察到的反馈中无法推断出手臂的丢失。在USS设置中,武器与固定成本相关联并订购,形成级联。在每一轮中,学习者选择一个手臂,并观察从手臂到选定的手臂的反馈。学习者的目标是找到最大程度地减少预期总损失的手臂。总损失是选择手臂的成本和与选定的手臂相关的随机损失的总和。这个问题是具有挑战性的,因为在不知道平均损失的情况下,人们无法计算所选手臂的总损失。显然,只有从问题结构中推断出最佳的臂时,学习才是可行的。如先前的工作中所示,当问题实例满足所谓的“弱位”(WD)属性时,学习是可能的。在WD下,我们表明我们的基于汤普森抽样的算法对于USS问题而言,您的遗憾几乎是最佳的遗憾,并且比现有算法具有更好的数值性能。
Thompson Sampling has generated significant interest due to its better empirical performance than upper confidence bound based algorithms. In this paper, we study Thompson Sampling based algorithm for Unsupervised Sequential Selection (USS) problem. The USS problem is a variant of the stochastic multi-armed bandits problem, where the loss of an arm can not be inferred from the observed feedback. In the USS setup, arms are associated with fixed costs and are ordered, forming a cascade. In each round, the learner selects an arm and observes the feedback from arms up to the selected arm. The learner's goal is to find the arm that minimizes the expected total loss. The total loss is the sum of the cost incurred for selecting the arm and the stochastic loss associated with the selected arm. The problem is challenging because, without knowing the mean loss, one cannot compute the total loss for the selected arm. Clearly, learning is feasible only if the optimal arm can be inferred from the problem structure. As shown in the prior work, learning is possible when the problem instance satisfies the so-called `Weak Dominance' (WD) property. Under WD, we show that our Thompson Sampling based algorithm for the USS problem achieves near optimal regret and has better numerical performance than existing algorithms.