通过层融合压缩深神网络

论文标题

通过层融合压缩深神网络

Compressing Deep Neural Networks via Layer Fusion

论文作者

Neill, James O', Steeg, Greg Ver, Galstyan, Aram

论文摘要

本文提出了\ textit {layer fusion} - 一种模型压缩技术，该技术发现了要组合的权重，然后融合了类似的完全连接，卷积和注意力层的权重。层融合可以显着减少原始网络的层数，而几乎没有其他计算开销，同时保持竞争性能。从CIFAR-10上的实验中，我们发现，当迭代地融合层融合时，各种深卷积神经网络可以保留在原始网络的2 \％精度之内，最高为3.33。对于使用验证的变压器模型的Wikitext-2语言建模数据集进行的实验，我们实现了压缩，该网络导致其原始大小的20 \％，而原始网络的5个困惑点之内。我们还发现，与原始网络相比，其他良好的压缩技术可以实现竞争性能，因为它给定足够数量的再培训步骤。通常，随着压缩量的增加，我们观察到性能的明显拐点，这表明在性能呈指数降低之前可以实现的压缩量有束缚。

This paper proposes \textit{layer fusion} - a model compression technique that discovers which weights to combine and then fuses weights of similar fully-connected, convolutional and attention layers. Layer fusion can significantly reduce the number of layers of the original network with little additional computation overhead, while maintaining competitive performance. From experiments on CIFAR-10, we find that various deep convolution neural networks can remain within 2\% accuracy points of the original networks up to a compression ratio of 3.33 when iteratively retrained with layer fusion. For experiments on the WikiText-2 language modelling dataset where pretrained transformer models are used, we achieve compression that leads to a network that is 20\% of its original size while being within 5 perplexity points of the original network. We also find that other well-established compression techniques can achieve competitive performance when compared to their original networks given a sufficient number of retraining steps. Generally, we observe a clear inflection point in performance as the amount of compression increases, suggesting a bound on the amount of compression that can be achieved before an exponential degradation in performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题