低级神经网络的非线性初始化方法

论文标题

低级神经网络的非线性初始化方法

Nonlinear Initialization Methods for Low-Rank Neural Networks

论文作者

Vodrahalli, Kiran, Shivanna, Rakesh, Sathiamoorthy, Maheswaran, Jain, Sagar, Chi, Ed H.

论文摘要

我们提出了一个新型的低级初始化框架，用于训练低级深神经网络 - 通过两个低级矩阵的产物重新参数重新参数的网络。最成功的现有方法光谱初始化是从全等级设置的初始化分布中绘制的样本，然后通过Frobenius Norm中的全级初始化参数最佳近似，并通过一对低级初始化矩阵通过奇异值分解。我们的方法的启发是从近似与每一层相对应的函数近似近似参数值更为重要的启发。我们证明，这两种方法的relu网络方法存在显着差距，尤其是当近似权重的所需等级减小时，或者随着输入的尺寸对层的增加（当网络宽度在维度上是超线性时，后一个点就成立）。在此过程中，我们提供了第一个可证明有效的算法，用于解决固定参数排名$ r $ r $的Relu低秩近似问题 - 以前，该问题在计算上是无法计算的，即使级别为$ 1 $。我们还提供了一种实用算法来解决此问题，该问题并不比现有的光谱初始化方法更昂贵，并通过训练Resnet and ExtisticNet模型来验证我们的理论（He等，2016; Tan＆Le，2019）在Imagenet上（Russakakovsky等，2015，2015）。

We propose a novel low-rank initialization framework for training low-rank deep neural networks -- networks where the weight parameters are re-parameterized by products of two low-rank matrices. The most successful prior existing approach, spectral initialization, draws a sample from the initialization distribution for the full-rank setting and then optimally approximates the full-rank initialization parameters in the Frobenius norm with a pair of low-rank initialization matrices via singular value decomposition. Our method is inspired by the insight that approximating the function corresponding to each layer is more important than approximating the parameter values. We provably demonstrate that there is a significant gap between these two approaches for ReLU networks, particularly as the desired rank of the approximating weights decreases, or as the dimension of the inputs to the layer increases (the latter point holds when the network width is super-linear in dimension). Along the way, we provide the first provably efficient algorithm for solving the ReLU low-rank approximation problem for fixed parameter rank $r$ -- previously, it was unknown that the problem was computationally tractable to solve even for rank $1$. We also provide a practical algorithm to solve this problem which is no more expensive than the existing spectral initialization approach, and validate our theory by training ResNet and EfficientNet models (He et al., 2016; Tan & Le, 2019) on ImageNet (Russakovsky et al., 2015).

下载PDF全文

下载文献需遵守相关版权规定

论文标题