论文标题
随机修剪的不合理有效性:稀疏训练最天真的基线的回报
The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training
论文作者
论文摘要
随机修剪可以说是在神经网络中达到稀疏性最幼稚的方法,但由于训练后修剪或稀疏训练而被视为毫无竞争。在本文中,我们专注于稀疏训练,并突出一个也许是违反直觉的发现,初始化时的随机修剪对于现代神经网络的稀疏培训可能非常有力。如果没有任何精致的修剪标准或精心追求的稀疏结构,我们从经验上证明,从头开始训练一个随机修剪的网络可以与其密集的同等性能相匹配。有两个关键因素有助于这一复兴:(i)网络大小很重要:随着原始密度网络的增长越来越深,训练的性能随机修剪的稀疏网络将迅速增长,以使其密集的等效量相匹配,即使在高稀疏性比下也是如此; (ii)可以预先选择适当的层次稀疏性比进行稀疏训练,这表明是另一个重要的性能助推器。看起来很简单,可以对宽Resnet-50的随机修剪的子网进行稀疏的训练,以优于Imagenet上的密集宽Resnet-50。我们还观察到,这种随机修剪的网络在其他有利方面的表现优于密集的,例如分布外检测,不确定性估计和对抗性鲁棒性。总体而言,我们的结果强烈表明,规模稀疏训练的余地大于预期,而稀疏性的好处可能更普遍,而不是精心设计的修剪。我们的源代码可以在https://github.com/vita-group/random_pruning上找到。
Random pruning is arguably the most naive way to attain sparsity in neural networks, but has been deemed uncompetitive by either post-training pruning or sparse training. In this paper, we focus on sparse training and highlight a perhaps counter-intuitive finding, that random pruning at initialization can be quite powerful for the sparse training of modern neural networks. Without any delicate pruning criteria or carefully pursued sparsity structures, we empirically demonstrate that sparsely training a randomly pruned network from scratch can match the performance of its dense equivalent. There are two key factors that contribute to this revival: (i) the network sizes matter: as the original dense networks grow wider and deeper, the performance of training a randomly pruned sparse network will quickly grow to matching that of its dense equivalent, even at high sparsity ratios; (ii) appropriate layer-wise sparsity ratios can be pre-chosen for sparse training, which shows to be another important performance booster. Simple as it looks, a randomly pruned subnetwork of Wide ResNet-50 can be sparsely trained to outperforming a dense Wide ResNet-50, on ImageNet. We also observed such randomly pruned networks outperform dense counterparts in other favorable aspects, such as out-of-distribution detection, uncertainty estimation, and adversarial robustness. Overall, our results strongly suggest there is larger-than-expected room for sparse training at scale, and the benefits of sparsity might be more universal beyond carefully designed pruning. Our source code can be found at https://github.com/VITA-Group/Random_Pruning.