论文标题
关于非平稳随机优化的适应性,匪徒反馈
On Adaptivity in Non-stationary Stochastic Optimization With Bandit Feedback
论文作者
论文摘要
在本文中,我们通过匪徒反馈和动态遗憾措施研究了非平稳随机优化问题。 Besbes等人的开创性工作。 (2015年)表明,当汇总函数更改是先验的更改时,简单的重新启动算法就会达到最佳的动态遗憾。在这项工作中,我们设计了一种带有固定步骤尺寸的随机优化算法,与Wei和Luo(2021)的多规模采样框架结合在一起,可以实现非常规随机优化的最佳动态遗憾,而无需提前知识预算,因此可以解决一个问题。我们还建立了一个额外的结果表明,任何对具有很高可能性的固定基准测试的算法都可以自动转换为一种算法,该算法对动态基准产生了良好的遗憾,该算法适用于多种匪徒convex convex优化算法。
In this paper we study the non-stationary stochastic optimization question with bandit feedback and dynamic regret measures. The seminal work of Besbes et al. (2015) shows that, when aggregated function changes is known a priori, a simple re-starting algorithm attains the optimal dynamic regret. In this work, we designed a stochastic optimization algorithm with fixed step sizes, which combined together with the multi-scale sampling framework of Wei and Luo (2021) achieves the optimal dynamic regret in non-stationary stochastic optimization without requiring prior knowledge of function change budget, thereby closes a question that has been open for a while. We also establish an additional result showing that any algorithm achieving good regret against stationary benchmarks with high probability could be automatically converted to an algorithm that achieves good regret against dynamic benchmarks, which is applicable to a wide class of bandit convex optimization algorithms.