论文标题
深层模型压缩的“网络修剪网络”方法
A "Network Pruning Network" Approach to Deep Model Compression
论文作者
论文摘要
我们使用多任务网络提出了用于深层模型压缩的滤波器修剪方法。我们的方法是基于学习修剪网络以修剪预训练的目标网络。修剪器本质上是一个多任务深神经网络,具有二进制输出,可帮助识别原始网络每一层的过滤器,这些过滤器对模型没有任何重要贡献,因此可以修剪。 Pruner网络具有与原始网络相同的架构,除了它具有包含二进制值输出的多任务/多输出层(每个过滤器一个),这表明必须修剪哪些过滤器。修剪器的目标是通过将零权重分配给相应的输出特征映射来最大程度地减少原始网络的过滤器数量。与大多数现有方法相反,我们的方法不依赖于迭代修剪,而是一口气修剪网络(原始网络),而且不需要为每一层指定修剪程度(并且可以学习)。我们方法产生的压缩模型是通用的,不需要任何特殊的硬件/软件支持。此外,使用其他方法(例如知识蒸馏,量化和连接修剪)增强可以增加提议方法的压缩程度。我们展示了我们提出的分类和对象检测任务方法的功效。
We present a filter pruning approach for deep model compression, using a multitask network. Our approach is based on learning a a pruner network to prune a pre-trained target network. The pruner is essentially a multitask deep neural network with binary outputs that help identify the filters from each layer of the original network that do not have any significant contribution to the model and can therefore be pruned. The pruner network has the same architecture as the original network except that it has a multitask/multi-output last layer containing binary-valued outputs (one per filter), which indicate which filters have to be pruned. The pruner's goal is to minimize the number of filters from the original network by assigning zero weights to the corresponding output feature-maps. In contrast to most of the existing methods, instead of relying on iterative pruning, our approach can prune the network (original network) in one go and, moreover, does not require specifying the degree of pruning for each layer (and can learn it instead). The compressed model produced by our approach is generic and does not need any special hardware/software support. Moreover, augmenting with other methods such as knowledge distillation, quantization, and connection pruning can increase the degree of compression for the proposed approach. We show the efficacy of our proposed approach for classification and object detection tasks.