维护神经修剪的训练性

论文标题

维护神经修剪的训练性

Trainability Preserving Neural Pruning

论文作者

Wang, Huan, Fu, Yun

论文摘要

许多最近的作品表明，训练性在神经网络修剪中起着核心作用 - 无人看管的损坏性可导致严重的表现不佳，并且无意间会扩大重新培训率的效果，从而导致（甚至误解）基准结果。本文介绍了保留修剪的训练性（TPP），这是一种可扩展的方法，可保护网络训练性，以防止修剪，旨在提高修剪性能，并更强大地训练超参数（例如，学习率）。具体而言，我们建议对卷积过滤器的革兰氏矩阵进行惩罚，以使保留过滤器的修剪过滤器脱离。除了保持整个网络的训练性的精神外，我们还建议将批准参数（比例和偏见）正常化。关于线性MLP网络的实证研究表明，TPP可以与Oracle训练性恢复方案相同。在CIFAR10/100上的非线性convnets（Resnet56/VGG19）上，TPP以明显的边距超过另一个对应方法。此外，使用重新NET的Imagenet-1K上的结果表明，与其他表现最佳结构化的修剪方法相对于其他表现更有利。代码：https：//github.com/mingsun-tse/tpp。

Many recent works have shown trainability plays a central role in neural network pruning -- unattended broken trainability can lead to severe under-performance and unintentionally amplify the effect of retraining learning rate, resulting in biased (or even misinterpreted) benchmark results. This paper introduces trainability preserving pruning (TPP), a scalable method to preserve network trainability against pruning, aiming for improved pruning performance and being more robust to retraining hyper-parameters (e.g., learning rate). Specifically, we propose to penalize the gram matrix of convolutional filters to decorrelate the pruned filters from the retained filters. In addition to the convolutional layers, per the spirit of preserving the trainability of the whole network, we also propose to regularize the batch normalization parameters (scale and bias). Empirical studies on linear MLP networks show that TPP can perform on par with the oracle trainability recovery scheme. On nonlinear ConvNets (ResNet56/VGG19) on CIFAR10/100, TPP outperforms the other counterpart approaches by an obvious margin. Moreover, results on ImageNet-1K with ResNets suggest that TPP consistently performs more favorably against other top-performing structured pruning approaches. Code: https://github.com/MingSun-Tse/TPP.

下载PDF全文

下载文献需遵守相关版权规定

论文标题