基于优化的算法，用于非平稳内核土匪，没有事先知识

论文标题

基于优化的算法，用于非平稳内核土匪，没有事先知识

An Optimization-based Algorithm for Non-stationary Kernel Bandits without Prior Knowledge

论文作者

Hong, Kihyuk, Li, Yuhang, Tewari, Ambuj

论文摘要

我们提出了一种非平稳核土匪的算法，该算法不需要先验知识非平稳性。该算法遵循通过解决平衡探索和剥削的优化问题获得的随机策略。当检测到奖励功能的更改时，它通过重新启动来适应非平稳性。我们的算法比以前在非平稳内核强盗设置上的工作更加动态遗憾。此外，当通过使用线性内核应用于非平稳线性匪徒设置时，我们的算法几乎是最小的最佳选择，可以在非平稳的线性匪徒文献中解决一个空的问题。我们将算法扩展到使用神经网络，以动态调整特征映射到观察到的数据。我们证明了使用神经切线内核理论的延伸的动态遗憾。我们从经验上证明，我们的算法和扩展可以适应不同程度的非平稳性。

We propose an algorithm for non-stationary kernel bandits that does not require prior knowledge of the degree of non-stationarity. The algorithm follows randomized strategies obtained by solving optimization problems that balance exploration and exploitation. It adapts to non-stationarity by restarting when a change in the reward function is detected. Our algorithm enjoys a tighter dynamic regret bound than previous work on the non-stationary kernel bandit setting. Moreover, when applied to the non-stationary linear bandit setting by using a linear kernel, our algorithm is nearly minimax optimal, solving an open problem in the non-stationary linear bandit literature. We extend our algorithm to use a neural network for dynamically adapting the feature mapping to observed data. We prove a dynamic regret bound of the extension using the neural tangent kernel theory. We demonstrate empirically that our algorithm and the extension can adapt to varying degrees of non-stationarity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题