黑盒学习和控制的新的单点残留反馈甲骨文

论文标题

黑盒学习和控制的新的单点残留反馈甲骨文

A New One-Point Residual-Feedback Oracle For Black-Box Learning and Control

论文作者

Zhang, Yan, Zhou, Yi, Ji, Kaiyi, Zavlanos, Michael M.

论文摘要

零阶优化（ZO）算法最近已用于求解基于黑框或基于仿真的学习和控制问题，在该问题中，目标函数的梯度不能轻易计算，但可以使用目标函数值近似。与单点反馈方案相比，许多现有的ZO算法由于其快速收敛率而采用了两点反馈方案。但是，两点方案需要对每次迭代的目标函数进行两次评估，这在数据并非所有人都可用的应用程序中可能是不切实际的，例如在在线优化中。在本文中，我们提出了一种新颖的单点反馈方案，该方案在每次迭代时一次查询函数值，并使用两个连续点之间的残差来估算梯度。在优化确定性Lipschitz函数时，我们表明ZO与所提出的单点剩余反馈的查询复杂性与ZO与现有的两点方案相匹配。此外，当目标函数具有LIPSCHITZ梯度时，可以改善所提出算法的查询复杂性。然后，对于仅给出了嘈杂的客观函数值的随机匪徒优化问题，我们表明，具有单点剩余反馈的ZO达到的收敛速率与具有无法控制的数据样本的两点方案相同。我们通过广泛的数值实验证明了提出的单点残差反馈的有效性。

Zeroth-order optimization (ZO) algorithms have been recently used to solve black-box or simulation-based learning and control problems, where the gradient of the objective function cannot be easily computed but can be approximated using the objective function values. Many existing ZO algorithms adopt two-point feedback schemes due to their fast convergence rate compared to one-point feedback schemes. However, two-point schemes require two evaluations of the objective function at each iteration, which can be impractical in applications where the data are not all available a priori, e.g., in online optimization. In this paper, we propose a novel one-point feedback scheme that queries the function value once at each iteration and estimates the gradient using the residual between two consecutive points. When optimizing a deterministic Lipschitz function, we show that the query complexity of ZO with the proposed one-point residual feedback matches that of ZO with the existing two-point schemes. Moreover, the query complexity of the proposed algorithm can be improved when the objective function has Lipschitz gradient. Then, for stochastic bandit optimization problems where only noisy objective function values are given, we show that ZO with one-point residual feedback achieves the same convergence rate as that of two-point scheme with uncontrollable data samples. We demonstrate the effectiveness of the proposed one-point residual feedback via extensive numerical experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题