论文标题
修剪神经网络,没有任何数据,通过迭代保存突触流
Pruning neural networks without any data by iteratively conserving synaptic flow
论文作者
论文摘要
修剪深神经网络的参数已经引起了激烈的兴趣,这是由于训练和测试时的时间,记忆和能量的潜在节省。最近的作品通过一系列昂贵的训练和修剪周期确定了赢得彩票票或稀疏可训练的子网时的存在。这提出了一个基本的问题:我们可以在初始化时识别出高度稀疏的可训练子网,或者从未培训,或者实际上从未看过数据?我们通过理论驱动的算法设计为这个问题提供了肯定的答案。我们首先在数学上制定并在实验上验证了一项保护定律,该定律解释了为什么在初始化时现有的基于梯度的修剪算法会遭受层折叠的损失,这是整个层的过早修剪,从而使网络无法实现。该理论还阐明了如何完全避免层折叠,激发了一种新颖的修剪算法迭代的突触流(Synflow)。该算法可以解释为在初始化时通过网络保留突触强度的总流量,但受到稀疏性约束。值得注意的是,该算法没有提及培训数据,并且在初始化时始终如一地竞争或竞争或竞争或胜过现有的最先进的修剪算法(VGG和RESNET),数据集(CIFAR-10/100和TINY IMAGENET),以及舒展约束(最高99.9999%)。因此,我们的数据无关修剪算法挑战了现有的范式,在初始化时,必须使用数据来量化哪些突触很重要。
Pruning the parameters of deep neural networks has generated intense interest due to potential savings in time, memory and energy both during training and at test time. Recent works have identified, through an expensive sequence of training and pruning cycles, the existence of winning lottery tickets or sparse trainable subnetworks at initialization. This raises a foundational question: can we identify highly sparse trainable subnetworks at initialization, without ever training, or indeed without ever looking at the data? We provide an affirmative answer to this question through theory driven algorithm design. We first mathematically formulate and experimentally verify a conservation law that explains why existing gradient-based pruning algorithms at initialization suffer from layer-collapse, the premature pruning of an entire layer rendering a network untrainable. This theory also elucidates how layer-collapse can be entirely avoided, motivating a novel pruning algorithm Iterative Synaptic Flow Pruning (SynFlow). This algorithm can be interpreted as preserving the total flow of synaptic strengths through the network at initialization subject to a sparsity constraint. Notably, this algorithm makes no reference to the training data and consistently competes with or outperforms existing state-of-the-art pruning algorithms at initialization over a range of models (VGG and ResNet), datasets (CIFAR-10/100 and Tiny ImageNet), and sparsity constraints (up to 99.99 percent). Thus our data-agnostic pruning algorithm challenges the existing paradigm that, at initialization, data must be used to quantify which synapses are important.