论文标题
基于梯度的重量密度平衡,用于鲁棒的动态稀疏训练
Gradient-based Weight Density Balancing for Robust Dynamic Sparse Training
论文作者
论文摘要
从头开始训练稀疏的神经网络需要与权重本身的同时优化连接。通常,重量更新数量后,将重量重新分布,从而删除每一层参数的一小部分,并将其插入相同层的不同位置。每一层的密度使用启发式方法确定,通常纯粹是基于参数张量的大小。虽然在训练过程中每层的连接多次优化,但每层的密度保持恒定。这留下了巨大的未实现的潜力,尤其是在高稀疏性90%及以上的情况下。我们提出了基于全球梯度的重新分配,该技术可以在所有层上分配权重 - 在最需要它们的层中增加了更多的权重。我们的评估表明,与以前的工作相比,初始化时的方法不容易获得不平衡的重量分布,并且能够在非常高的稀疏度水平上找到更好的稀疏子网。
Training a sparse neural network from scratch requires optimizing connections at the same time as the weights themselves. Typically, the weights are redistributed after a predefined number of weight updates, removing a fraction of the parameters of each layer and inserting them at different locations in the same layers. The density of each layer is determined using heuristics, often purely based on the size of the parameter tensor. While the connections per layer are optimized multiple times during training, the density of each layer remains constant. This leaves great unrealized potential, especially in scenarios with a high sparsity of 90% and more. We propose Global Gradient-based Redistribution, a technique which distributes weights across all layers - adding more weights to the layers that need them most. Our evaluation shows that our approach is less prone to unbalanced weight distribution at initialization than previous work and that it is able to find better performing sparse subnetworks at very high sparsity levels.