论文标题
为神经模型压缩利用非线性冗余
Exploiting Non-Linear Redundancy for Neural Model Compression
论文作者
论文摘要
鉴于现实世界中的记忆,力量和计算限制,部署深度学习模型,包括数百万甚至数十亿的非线性组合,甚至数十亿美元的参数。这种情况导致了对模型压缩技术的研究,其中大多数依赖于次优启发式方法,并且不考虑由于过份术中神经元激活之间的线性依赖性而引起的参数冗余。在本文中,我们提出了一种基于线性依赖性的利用,提出了一种新型的模型压缩方法,该方法通过消除整个神经元和将其激活重新分布在其他神经元上以训练时无效的方式来压缩网络。我们将这种方法与一种在训练期间甚至在受过训练的模型中使用的退火算法结合使用,并使用流行的数据集证明我们的方法会导致整体网络尺寸的最高99 \%,并且性能损失较小。此外,我们提供了理论上的结果,表明在存在冗余特征的过多散热性,局部线性(relu)神经网络中,并且有了正确的超参数选择,我们的方法确实能够捕获和抑制这些依赖性。
Deploying deep learning models, comprising of non-linear combination of millions, even billions, of parameters is challenging given the memory, power and compute constraints of the real world. This situation has led to research into model compression techniques most of which rely on suboptimal heuristics and do not consider the parameter redundancies due to linear dependence between neuron activations in overparametrized networks. In this paper, we propose a novel model compression approach based on exploitation of linear dependence, that compresses networks by elimination of entire neurons and redistribution of their activations over other neurons in a manner that is provably lossless while training. We combine this approach with an annealing algorithm that may be applied during training, or even on a trained model, and demonstrate, using popular datasets, that our method results in a reduction of up to 99\% in overall network size with small loss in performance. Furthermore, we provide theoretical results showing that in overparametrized, locally linear (ReLU) neural networks where redundant features exist, and with correct hyperparameter selection, our method is indeed able to capture and suppress those dependencies.