论文标题
频谱偏置预测内核脊回归的失败和成功:低维数据的情况
Failure and success of the spectral bias prediction for Kernel Ridge Regression: the case of low-dimensional data
论文作者
论文摘要
最近,包括复制方法在内的几种理论对内核脊回归的概括误差进行了预测。在某些制度中,他们预测该方法具有“光谱偏差”:分解核心本质上的真实功能$ f^*$,它非常适合与O(p)最大特征值相关的系数,其中$ p $是训练集的大小。该预测在基准数据集(例如图像)上非常有效,但是这些方法对数据的假设在实践中永远无法满足。为了阐明何时存在频谱偏差预测,我们首先关注一个一维模型,在该模型中获得了严格的结果,然后使用缩放参数来概括和测试我们的发现在较高的维度中。我们的预测包括分类案例$ f(x)= $ sign $(x_1)$,其数据分布在决策边界$ p(x)\ sim x_1^χ$上消失。对于$χ> 0 $和一个拉普拉斯内核,我们发现(i)存在一个交叉脊$λ^*_ {d,χ}(p)\ sim p^{ - \ frac { - \ frac {1} {d+χ}} $ $λ\llλ^*_ {d,χ}(p)$,(ii)在无脊的情况下,频谱偏置仅在限制$ d \ rightarrow \ rightarrow \ infty $中预测正确的训练曲线指数。
Recently, several theories including the replica method made predictions for the generalization error of Kernel Ridge Regression. In some regimes, they predict that the method has a `spectral bias': decomposing the true function $f^*$ on the eigenbasis of the kernel, it fits well the coefficients associated with the O(P) largest eigenvalues, where $P$ is the size of the training set. This prediction works very well on benchmark data sets such as images, yet the assumptions these approaches make on the data are never satisfied in practice. To clarify when the spectral bias prediction holds, we first focus on a one-dimensional model where rigorous results are obtained and then use scaling arguments to generalize and test our findings in higher dimensions. Our predictions include the classification case $f(x)=$sign$(x_1)$ with a data distribution that vanishes at the decision boundary $p(x)\sim x_1^χ$. For $χ>0$ and a Laplace kernel, we find that (i) there exists a cross-over ridge $λ^*_{d,χ}(P)\sim P^{-\frac{1}{d+χ}}$ such that for $λ\gg λ^*_{d,χ}(P)$, the replica method applies, but not for $λ\llλ^*_{d,χ}(P)$, (ii) in the ridge-less case, spectral bias predicts the correct training curve exponent only in the limit $d\rightarrow\infty$.