论文标题
经验风险最小化的普遍性
Universality of empirical risk minimization
论文作者
论文摘要
考虑从I.I.D.监督学习样本$ \ {{{\ boldsymbol x} _i,y_i \} _ {i \ le n} $,其中$ {\ boldsymbol x} _i \ in \ in \ mathbb {r}^p $是featial vectors and features vectors and $ {y} \ in \ mathbb {y mathbb {y mathbb> rab eave vectors。我们研究了由$ \ mathsf {k} = o(1)$ vectors $ {\boldsymbolθ} _1的一类函数的经验风险最小化。 。 。 ,{\boldsymbolθ} _ {\ Mathsf k} \ in \ Mathbb {r}^p $,并证明了训练和测试错误的普遍性结果。也就是说,在比例渐近学$ n,p \ to \ infty $的情况下,$ n/p =θ(1)$,我们证明训练误差仅通过其协方差结构来取决于随机特征分布。此外,我们证明,近经验风险最小化器的最小测试误差具有相似的普遍性。特别是,在更简单的型号下,可以将这些数量的渐近学计算为$ - $ - $ - $ - $ {\ boldsymbol x} _i $被高斯矢量$ {\ boldsymbol g} _i $替换为具有相同的共价。早期的普遍性结果仅限于强烈凸学习过程,或者具有矢量$ {\ boldsymbol x} _i $具有独立条目。我们的结果并未做出这些假设。我们的假设足够一般,可以包含特征向量$ {\ boldsymbol x} _i $,这些_ $由随机特征映射产生。特别是,我们明确检查某些随机特征模型(计算具有随机权重的单层神经网络的输出)和神经切线模型(两层网络的一阶Taylor近似)的假设。
Consider supervised learning from i.i.d. samples $\{{\boldsymbol x}_i,y_i\}_{i\le n}$ where ${\boldsymbol x}_i \in\mathbb{R}^p$ are feature vectors and ${y} \in \mathbb{R}$ are labels. We study empirical risk minimization over a class of functions that are parameterized by $\mathsf{k} = O(1)$ vectors ${\boldsymbol θ}_1, . . . , {\boldsymbol θ}_{\mathsf k} \in \mathbb{R}^p$ , and prove universality results both for the training and test error. Namely, under the proportional asymptotics $n,p\to\infty$, with $n/p = Θ(1)$, we prove that the training error depends on the random features distribution only through its covariance structure. Further, we prove that the minimum test error over near-empirical risk minimizers enjoys similar universality properties. In particular, the asymptotics of these quantities can be computed $-$to leading order$-$ under a simpler model in which the feature vectors ${\boldsymbol x}_i$ are replaced by Gaussian vectors ${\boldsymbol g}_i$ with the same covariance. Earlier universality results were limited to strongly convex learning procedures, or to feature vectors ${\boldsymbol x}_i$ with independent entries. Our results do not make any of these assumptions. Our assumptions are general enough to include feature vectors ${\boldsymbol x}_i$ that are produced by randomized featurization maps. In particular we explicitly check the assumptions for certain random features models (computing the output of a one-layer neural network with random weights) and neural tangent models (first-order Taylor approximation of two-layer networks).