论文标题
使用功能相关性得分进行卷积神经网络训练时逐渐修剪的逐渐修剪
Gradual Channel Pruning while Training using Feature Relevance Scores for Convolutional Neural Networks
论文作者
论文摘要
深层神经网络的巨大推理成本可以通过网络压缩来缩放。修剪是用于深网压缩的主要方法之一。但是,现有的修剪技术具有以下一个或多个局限性:1)由于修剪和微调阶段,在计算重型训练阶段之外的额外能量成本,2)基于特定的统计数据,根据特定的统计数据,将层的修剪置于特定的统计数据,而忽略了网络中错误传播的效果,3)缺乏有效的估计来确定重要的通道,以确定重要的通道,以确定全球范围的构造,4)4)构造。为了解决上述所有问题,我们提出了一个简单的逐渐逐步修剪,而使用新型数据驱动的指标训练方法,称为功能相关性得分。提出的技术通过在实际训练阶段以固定的时间间隔修剪最小重要的通道,从而消除了额外的再训练周期。功能相关性得分有助于有效评估每个通道对网络判别能力的贡献。我们使用诸如CIFAR-10,CIFAR-100和IMAGENET等数据集的架构上展示了拟议的方法对诸如VGG和Resnet之类的架构的有效性,并成功实现了重要的模型压缩,同时交易不到$ 1 \%\%$ $。值得注意的是,在经过Resnet-11培训的CIFAR-10数据集上,我们的方法达到了$ 2.4 \ times $压缩和$ 56 \%$ $ $ $ $ $ $ $ $ $ $ $ $ $ $减少,而准确性下降了$ 0.01 \%\%$ $。
The enormous inference cost of deep neural networks can be scaled down by network compression. Pruning is one of the predominant approaches used for deep network compression. However, existing pruning techniques have one or more of the following limitations: 1) Additional energy cost on top of the compute heavy training stage due to pruning and fine-tuning stages, 2) Layer-wise pruning based on the statistics of a particular, ignoring the effect of error propagation in the network, 3) Lack of an efficient estimate for determining the important channels globally, 4) Unstructured pruning requires specialized hardware for effective use. To address all the above issues, we present a simple-yet-effective gradual channel pruning while training methodology using a novel data-driven metric referred to as feature relevance score. The proposed technique gets rid of the additional retraining cycles by pruning the least important channels in a structured fashion at fixed intervals during the actual training phase. Feature relevance scores help in efficiently evaluating the contribution of each channel towards the discriminative power of the network. We demonstrate the effectiveness of the proposed methodology on architectures such as VGG and ResNet using datasets such as CIFAR-10, CIFAR-100 and ImageNet, and successfully achieve significant model compression while trading off less than $1\%$ accuracy. Notably on CIFAR-10 dataset trained on ResNet-110, our approach achieves $2.4\times$ compression and a $56\%$ reduction in FLOPs with an accuracy drop of $0.01\%$ compared to the unpruned network.