通过凸双重性揭示深神经网络的结构

论文标题

通过凸双重性揭示深神经网络的结构

Revealing the Structure of Deep Neural Networks via Convex Duality

论文作者

Ergen, Tolga, Pilanci, Mert

论文摘要

我们研究正规的深神经网络（DNN），并引入凸分析框架以表征隐藏层的结构。我们表明，可以将一组最佳的隐藏层权重定为标准的正规化DNN训练问题，可以明确地找到作为凸组集合的极端点。对于深线性网络的特殊情况，我们证明每个最佳权重矩阵通过双重性与先前的层对准。更重要的是，我们将相同的表征应用于具有白色数据的深度relu网络，并证明具有相同的权重一致性。作为推论，我们还证明，规范对一维数据集的规范化深层relu网络产生了样条插值，这是以前仅以两层网络而闻名的。此外，当数据是排名或变白时，我们为最佳层权重提供了封闭形式的解决方案。即使在任意数据中，也适用于具有批归归量的体系结构。因此，我们为最近的经验观察（称为神经崩溃）获得了一个完整的解释，其中类意味着塌陷到单纯含量紧密框架的顶点。

We study regularized deep neural networks (DNNs) and introduce a convex analytic framework to characterize the structure of the hidden layers. We show that a set of optimal hidden layer weights for a norm regularized DNN training problem can be explicitly found as the extreme points of a convex set. For the special case of deep linear networks, we prove that each optimal weight matrix aligns with the previous layers via duality. More importantly, we apply the same characterization to deep ReLU networks with whitened data and prove the same weight alignment holds. As a corollary, we also prove that norm regularized deep ReLU networks yield spline interpolation for one-dimensional datasets which was previously known only for two-layer networks. Furthermore, we provide closed-form solutions for the optimal layer weights when data is rank-one or whitened. The same analysis also applies to architectures with batch normalization even for arbitrary data. Therefore, we obtain a complete explanation for a recent empirical observation termed Neural Collapse where class means collapse to the vertices of a simplex equiangular tight frame.

下载PDF全文

下载文献需遵守相关版权规定

论文标题