无维度山脊回归

论文标题

无维度山脊回归

Dimension free ridge regression

论文作者

Cheng, Chen, Montanari, Andrea

论文摘要

随机矩阵理论已成为高维统计和理论机器学习的广泛有用的工具。但是，随机矩阵理论主要集中在比例渐近学上，其中列数与数据矩阵的行数成比例地增长。这并不总是统计中最自然的环境，其中列对应于协变量和行与样品。为了超越比例渐近技术，我们在I.I.D.上重新访问了脊回归（$ \ ell_2 $ - 二元最小二乘）。 data $（x_i，y_i）$，$ i \ le n $，其中$ x_i $是一个功能向量，$ y_i =β^\ top x_i +ε_i\ in \ in \ mathbb {r} $是一个响应。我们允许特征向量具有高维度，甚至是无限维的，在这种情况下，它属于可分离的希尔伯特空间，并假设$ z_i：=σ^{ - 1/2} x_i $具有I.I.D。条目，或满足特定凸浓度的特性。在这种情况下，我们建立了非反应界限，该界限近似于“等效”序列模型的偏差和方差（具有对角设计矩阵的回归模型）的偏差和方差。对于某些明显的小$δ$，近似值取决于$（1 \pmΔ）$的乘法因子。以前，这样的近似结果仅在比例方面才知道，并且仅到加性错误：特别是，当这种收敛到$ 0 $时，它不允许表征多余风险的行为。我们的一般理论以比例状态（以更好的错误率）恢复了早期的结果。作为一种新应用，我们获得了带有常规变化频谱的希尔伯特协变量的山脊回归的完全明确而尖锐的表征。最后，我们分析了过多的近距离插值设置，并获得了尖锐的“良性过拟合”保证。

Random matrix theory has become a widely useful tool in high-dimensional statistics and theoretical machine learning. However, random matrix theory is largely focused on the proportional asymptotics in which the number of columns grows proportionally to the number of rows of the data matrix. This is not always the most natural setting in statistics where columns correspond to covariates and rows to samples. With the objective to move beyond the proportional asymptotics, we revisit ridge regression ($\ell_2$-penalized least squares) on i.i.d. data $(x_i, y_i)$, $i\le n$, where $x_i$ is a feature vector and $y_i = β^\top x_i +ε_i \in\mathbb{R}$ is a response. We allow the feature vector to be high-dimensional, or even infinite-dimensional, in which case it belongs to a separable Hilbert space, and assume either $z_i := Σ^{-1/2}x_i$ to have i.i.d. entries, or to satisfy a certain convex concentration property. Within this setting, we establish non-asymptotic bounds that approximate the bias and variance of ridge regression in terms of the bias and variance of an `equivalent' sequence model (a regression model with diagonal design matrix). The approximation is up to multiplicative factors bounded by $(1\pm Δ)$ for some explicitly small $Δ$. Previously, such an approximation result was known only in the proportional regime and only up to additive errors: in particular, it did not allow to characterize the behavior of the excess risk when this converges to $0$. Our general theory recovers earlier results in the proportional regime (with better error rates). As a new application, we obtain a completely explicit and sharp characterization of ridge regression for Hilbert covariates with regularly varying spectrum. Finally, we analyze the overparametrized near-interpolation setting and obtain sharp `benign overfitting' guarantees.

下载PDF全文

下载文献需遵守相关版权规定

论文标题