论文标题
带有随机标签的感知性的高斯普遍性
Gaussian Universality of Perceptrons with Random Labels
论文作者
论文摘要
尽管在许多理论设置(尤其是在统计物理启发的作品中)的古典作品,但假设高斯I.I.D.在统计和机器学习的背景下,输入数据通常被认为是强大的限制。在这项研究中,我们在广义线性分类的情况下兑换了这项工作,又称感知模型,带有随机标签。我们认为,高维输入数据有大量的通用类别类别,我们获得了与具有相应数据协方差的高斯数据相同的最小培训损失。在消失正规化的限制下,我们进一步证明了训练损失与数据协方差无关。从理论方面来说,我们证明了这种普遍性的均匀高斯云的任意混合物。从经验上讲,我们表明普遍性也适用于广泛的真实数据集。
While classical in many theoretical settings - and in particular in statistical physics-inspired works - the assumption of Gaussian i.i.d. input data is often perceived as a strong limitation in the context of statistics and machine learning. In this study, we redeem this line of work in the case of generalized linear classification, a.k.a. the perceptron model, with random labels. We argue that there is a large universality class of high-dimensional input data for which we obtain the same minimum training loss as for Gaussian data with corresponding data covariance. In the limit of vanishing regularization, we further demonstrate that the training loss is independent of the data covariance. On the theoretical side, we prove this universality for an arbitrary mixture of homogeneous Gaussian clouds. Empirically, we show that the universality holds also for a broad range of real datasets.