论文标题
通过层融合压缩深神网络
Compressing Deep Neural Networks via Layer Fusion
论文作者
论文摘要
本文提出了\ textit {layer fusion} - 一种模型压缩技术,该技术发现了要组合的权重,然后融合了类似的完全连接,卷积和注意力层的权重。层融合可以显着减少原始网络的层数,而几乎没有其他计算开销,同时保持竞争性能。从CIFAR-10上的实验中,我们发现,当迭代地融合层融合时,各种深卷积神经网络可以保留在原始网络的2 \%精度之内,最高为3.33。对于使用验证的变压器模型的Wikitext-2语言建模数据集进行的实验,我们实现了压缩,该网络导致其原始大小的20 \%,而原始网络的5个困惑点之内。我们还发现,与原始网络相比,其他良好的压缩技术可以实现竞争性能,因为它给定足够数量的再培训步骤。通常,随着压缩量的增加,我们观察到性能的明显拐点,这表明在性能呈指数降低之前可以实现的压缩量有束缚。
This paper proposes \textit{layer fusion} - a model compression technique that discovers which weights to combine and then fuses weights of similar fully-connected, convolutional and attention layers. Layer fusion can significantly reduce the number of layers of the original network with little additional computation overhead, while maintaining competitive performance. From experiments on CIFAR-10, we find that various deep convolution neural networks can remain within 2\% accuracy points of the original networks up to a compression ratio of 3.33 when iteratively retrained with layer fusion. For experiments on the WikiText-2 language modelling dataset where pretrained transformer models are used, we achieve compression that leads to a network that is 20\% of its original size while being within 5 perplexity points of the original network. We also find that other well-established compression techniques can achieve competitive performance when compared to their original networks given a sufficient number of retraining steps. Generally, we observe a clear inflection point in performance as the amount of compression increases, suggesting a bound on the amount of compression that can be achieved before an exponential degradation in performance.