在线算法，用于无监督的顺序选择，并使用上下文信息

论文标题

在线算法，用于无监督的顺序选择，并使用上下文信息

Online Algorithm for Unsupervised Sequential Selection with Contextual Information

论文作者

Verma, Arun, Hanawal, Manjesh K., Szepesvári, Csaba, Saligrama, Venkatesh

论文摘要

在本文中，我们研究了上下文无监督的顺序选择（USS），这是随机上下文匪徒问题的新变体，其中无法从观察到的反馈中推断出手臂的丢失。在我们的设置中，武器与固定成本相关联并订购，形成级联。在每个回合中，都会呈现一个上下文，学习者依次选择手臂直至一定深度。停止在手臂上产生的总成本是所选武器的固定成本和与手臂相关的随机损失的总和。学习者的目标是学习一条决策规则，该规则将上下文绘制以最大程度地减少预期损失的目标。由于无法估算总损失，因此问题很具有挑战性，因为我们面临着无监督的环境。显然，只有从问题结构中推断出最佳臂（明确或隐式）时，学习才能可行。我们观察到，当问题实例满足所谓的“上下文弱统治”（CWD）属性时，学习仍然是可能的。在CWD下，我们为上下文USS问题提出了一种算法，并证明了它具有次线性遗憾。合成数据集的实验验证了我们的算法。

In this paper, we study Contextual Unsupervised Sequential Selection (USS), a new variant of the stochastic contextual bandits problem where the loss of an arm cannot be inferred from the observed feedback. In our setup, arms are associated with fixed costs and are ordered, forming a cascade. In each round, a context is presented, and the learner selects the arms sequentially till some depth. The total cost incurred by stopping at an arm is the sum of fixed costs of arms selected and the stochastic loss associated with the arm. The learner's goal is to learn a decision rule that maps contexts to arms with the goal of minimizing the total expected loss. The problem is challenging as we are faced with an unsupervised setting as the total loss cannot be estimated. Clearly, learning is feasible only if the optimal arm can be inferred (explicitly or implicitly) from the problem structure. We observe that learning is still possible when the problem instance satisfies the so-called 'Contextual Weak Dominance' (CWD) property. Under CWD, we propose an algorithm for the contextual USS problem and demonstrate that it has sub-linear regret. Experiments on synthetic and real datasets validate our algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题