多维在线决策

论文标题

多维在线决策

Stochastic Low-rank Tensor Bandits for Multi-dimensional Online Decision Making

论文作者

Zhou, Jie, Hao, Botao, Wen, Zheng, Zhang, Jingfei, Sun, Will Wei

论文摘要

多维在线决策制定在许多真实应用程序（例如在线推荐和数字营销）中起着至关重要的作用。在这些问题中，每次决定都是来自不同类型实体的选择的组合。为了解决它，我们引入了随机的低级张量匪徒，这是一类土匪，其平均奖励可以表示为低级张量。我们考虑两种设置，即无上光性的张量匪徒，以及带有上下文的张量。在第一个环境中，该平台旨在找到具有最高预期奖励的最佳决定，也就是真正的奖励张量最大的入口。在第二个设置中，张量的某些模式是上下文，其余模式是决策，目标是在上下文信息下找到最佳决策。我们提出了两种学习算法张量消除和张量的无上下文张量的张量刺激性，并为它们带来了有限的时间后悔界限。与现有的竞争方法相比，张量消除具有最好的整体遗憾结合，而张量时期的刺激对奖励张量的维度具有较高的依赖性。此外，我们开发了一种实际上有效的贝叶斯算法，称为带有上下文的张量强盗，称为张量集合抽样。在线广告数据中的大量模拟和真实分析备份了我们的理论发现，并表明我们的算法优于各种忽略张量低级别结构的最先进方法。

Multi-dimensional online decision making plays a crucial role in many real applications such as online recommendation and digital marketing. In these problems, a decision at each time is a combination of choices from different types of entities. To solve it, we introduce stochastic low-rank tensor bandits, a class of bandits whose mean rewards can be represented as a low-rank tensor. We consider two settings, tensor bandits without context and tensor bandits with context. In the first setting, the platform aims to find the optimal decision with the highest expected reward, a.k.a, the largest entry of true reward tensor. In the second setting, some modes of the tensor are contexts and the rest modes are decisions, and the goal is to find the optimal decision given the contextual information. We propose two learning algorithms tensor elimination and tensor epoch-greedy for tensor bandits without context, and derive finite-time regret bounds for them. Comparing with existing competitive methods, tensor elimination has the best overall regret bound and tensor epoch-greedy has a sharper dependency on dimensions of the reward tensor. Furthermore, we develop a practically effective Bayesian algorithm called tensor ensemble sampling for tensor bandits with context. Extensive simulations and real analysis in online advertising data back up our theoretical findings and show that our algorithms outperform various state-of-the-art approaches that ignore the tensor low-rank structure.

下载PDF全文

下载文献需遵守相关版权规定

论文标题