具有较大动作空间的上下文匪徒：实用

论文标题

具有较大动作空间的上下文匪徒：实用

Contextual Bandits with Large Action Spaces: Made Practical

论文作者

Zhu, Yinglun, Foster, Dylan J., Langford, John, Mineiro, Paul

论文摘要

顺序决策中的一个核心问题是开发实用且计算上有效但支持灵活的通用模型的算法。当关注上下文的匪徒问题时，最近的进步提供了可证明的有效算法，当可能的替代方案数量（“动作”）很小，但在大型，连续的行动空间中的决策保证仍然难以捉摸，从而导致理论和实践之间存在很大的差距。我们提出了第一个有效的通用算法，用于具有连续的，线性结构化作用空间的上下文匪徒。我们的算法利用了（i）监督学习的计算序列，以及（ii）在动作空间上进行优化，并实现了样本的复杂性，运行时和内存，而与动作空间的大小无关。此外，这是简单而实用的。我们进行大规模的经验评估，并表明我们的方法通常比标准基准相比具有较高的性能和效率。

A central problem in sequential decision making is to develop algorithms that are practical and computationally efficient, yet support the use of flexible, general-purpose models. Focusing on the contextual bandit problem, recent progress provides provably efficient algorithms with strong empirical performance when the number of possible alternatives ("actions") is small, but guarantees for decision making in large, continuous action spaces have remained elusive, leading to a significant gap between theory and practice. We present the first efficient, general-purpose algorithm for contextual bandits with continuous, linearly structured action spaces. Our algorithm makes use of computational oracles for (i) supervised learning, and (ii) optimization over the action space, and achieves sample complexity, runtime, and memory independent of the size of the action space. In addition, it is simple and practical. We perform a large-scale empirical evaluation, and show that our approach typically enjoys superior performance and efficiency compared to standard baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题