论文标题
稀疏神经网络中的梯度流以及彩票的获胜
Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win
论文作者
论文摘要
稀疏的神经网络(NNS)可以使用计算/存储的一部分进行推理与致密NN的概括,并且也有可能实现有效的训练。然而,从随机初始化中天真地训练非结构化的稀疏NN会导致概括明显较差,但显着的彩票票(LTS)和动态稀疏训练(DST)。通过对训练期间梯度流的分析,我们试图回答:(1)为什么训练来自随机初始化的非结构化稀疏网络的性能差; (2)是什么使LTS和DST成为例外?我们表明,稀疏的NN在初始化时的梯度流较差,并证明了使用稀疏感知初始化的重要性。此外,我们发现在传统稀疏训练方法上,DST方法在训练过程中显着改善了梯度流。最后,我们表明LTS并不能改善梯度流,而是它们的成功在于重新学习它们是从它们源自的修剪解决方案 - 但是,这是以学习新颖解决方案为代价的。
Sparse Neural Networks (NNs) can match the generalization of dense NNs using a fraction of the compute/storage for inference, and also have the potential to enable efficient training. However, naively training unstructured sparse NNs from random initialization results in significantly worse generalization, with the notable exceptions of Lottery Tickets (LTs) and Dynamic Sparse Training (DST). Through our analysis of gradient flow during training we attempt to answer: (1) why training unstructured sparse networks from random initialization performs poorly and; (2) what makes LTs and DST the exceptions? We show that sparse NNs have poor gradient flow at initialization and demonstrate the importance of using sparsity-aware initialization. Furthermore, we find that DST methods significantly improve gradient flow during training over traditional sparse training methods. Finally, we show that LTs do not improve gradient flow, rather their success lies in re-learning the pruning solution they are derived from - however, this comes at the cost of learning novel solutions.