高斯混合物的二元分类：丰富的支撑矢量，良性过度拟合和正则化

论文标题

高斯混合物的二元分类：丰富的支撑矢量，良性过度拟合和正则化

Binary Classification of Gaussian Mixtures: Abundance of Support Vectors, Benign Overfitting and Regularization

论文作者

Wang, Ke, Thrampoulidis, Christos

论文摘要

深层神经网络尽管被过度参数过度分析并接受训练而没有明确的正则化，但仍能很好地概括。这种奇怪的现象激发了广泛的研究活动，以建立其统计原则：在观察到什么条件下？这些如何取决于数据和培训算法？正则化何时有益于概括？尽管此类问题仍然是深层神经网的开放，但最近的作品试图通过研究更简单的，通常是线性模型来获得见解。我们的论文通过在生成高斯混合模型下检查二进制线性分类来为这一不断增长的工作做出了贡献。受到梯度下降的隐式偏差的最新结果的激励，我们研究了Max-Margin SVM分类器（对应于Logistic损失）和Min-Norm插值分类器（对应于最小二乘损失）。首先，我们利用[V.中引入的想法。 Muthukumar等人，Arxiv：2005.08054，（2020）]，将SVM溶液与Min-norm插值溶液相关联。其次，我们在后者的分类误差上得出了新型的非反应界限。结合两者，我们在协方差谱和信噪比（SNR）上介绍了新的足够条件，随着过度参数化的增加，插值估计器在插值估计器上实现渐近最佳性能。有趣的是，我们的结果扩展到具有恒定概率噪声翻转的嘈杂模型。与先前研究的歧视性数据模型相反，我们的结果强调了SNR及其与数据协方差的相互作用的关键作用。最后，通过分析参数和数值演示的组合，我们确定了插值估计器的性能优于相应的正则估计值的条件。

Deep neural networks generalize well despite being exceedingly overparameterized and being trained without explicit regularization. This curious phenomenon has inspired extensive research activity in establishing its statistical principles: Under what conditions is it observed? How do these depend on the data and on the training algorithm? When does regularization benefit generalization? While such questions remain wide open for deep neural nets, recent works have attempted gaining insights by studying simpler, often linear, models. Our paper contributes to this growing line of work by examining binary linear classification under a generative Gaussian mixture model. Motivated by recent results on the implicit bias of gradient descent, we study both max-margin SVM classifiers (corresponding to logistic loss) and min-norm interpolating classifiers (corresponding to least-squares loss). First, we leverage an idea introduced in [V. Muthukumar et al., arXiv:2005.08054, (2020)] to relate the SVM solution to the min-norm interpolating solution. Second, we derive novel non-asymptotic bounds on the classification error of the latter. Combining the two, we present novel sufficient conditions on the covariance spectrum and on the signal-to-noise ratio (SNR) under which interpolating estimators achieve asymptotically optimal performance as overparameterization increases. Interestingly, our results extend to a noisy model with constant probability noise flips. Contrary to previously studied discriminative data models, our results emphasize the crucial role of the SNR and its interplay with the data covariance. Finally, via a combination of analytical arguments and numerical demonstrations we identify conditions under which the interpolating estimator performs better than corresponding regularized estimates.

下载PDF全文

下载文献需遵守相关版权规定

论文标题