论文标题
零级随机梯度估计的信息理论下限
Information-Theoretic Lower Bounds for Zero-Order Stochastic Gradient Estimation
论文作者
论文摘要
在本文中,我们分析了必要数量的样品,以估计零级随机甲骨文模型中任何多维平滑(可能是非凸)函数的梯度。在此模型中,估计器可以访问该函数的嘈杂值,以产生梯度的估计值。我们还提供了有关有限差异方法的足够数量的样品,这是数值线性代数中的经典技术。对于$ t $样本和$ d $尺寸,我们的信息理论下限为$ω(\ sqrt {d/t})$。我们表明,有限变化的Oracle的有限差方法具有速率$ o(d^{4/3}/\ sqrt {t})$,用于零第三和高阶导数的功能。高斯甲壳的这些速率很紧。因此,有限的差异方法不是最小值的最佳选择,因此有开发更好梯度估计方法的空间。
In this paper we analyze the necessary number of samples to estimate the gradient of any multidimensional smooth (possibly non-convex) function in a zero-order stochastic oracle model. In this model, an estimator has access to noisy values of the function, in order to produce the estimate of the gradient. We also provide an analysis on the sufficient number of samples for the finite difference method, a classical technique in numerical linear algebra. For $T$ samples and $d$ dimensions, our information-theoretic lower bound is $Ω(\sqrt{d/T})$. We show that the finite difference method for a bounded-variance oracle has rate $O(d^{4/3}/\sqrt{T})$ for functions with zero third and higher order derivatives. These rates are tight for Gaussian oracles. Thus, the finite difference method is not minimax optimal, and therefore there is space for the development of better gradient estimation methods.