基于大小的修剪的层自适应稀疏性

论文标题

基于大小的修剪的层自适应稀疏性

Layer-adaptive sparsity for the Magnitude-based Pruning

论文作者

Lee, Jaeho, Park, Sejun, Mo, Sangwoo, Ahn, Sungsoo, Shin, Jinwoo

论文摘要

关于神经网络修剪的最新发现表明，凭借精心选择的层次稀疏性，基于简单的修剪可以实现稀疏和性能之间的最新权衡。但是，在“如何选择”上没有明确的共识，层的稀数率主要是划分算法，通常是诉诸于手工制作的启发式方法或广泛的超参数搜索。为了填补这一空白，我们提出了对全球修剪，基于自适应幅度的整体修剪（LAMP）得分的新颖重要性得分；该分数是重新定量的重量幅度版本，结合了修剪产生的模型级$ \ ell_2 $变形，并且不需要任何超参数调谐或重量计算。在各种图像分类设置下，LAMP始终优于流行的现有方案，用于层次稀疏选择。此外，我们观察到，即使在重量训练的设置中，灯也会继续胜过基线，而以连接性为导向的层稀疏性（总体总体最强）的性能比在这种情况下的简单基于全球幅度的修剪效果差。代码：https：//github.com/jaeho-lee/layer-aptive-sparsity

Recent discoveries on neural network pruning reveal that, with a carefully chosen layerwise sparsity, a simple magnitude-based pruning achieves state-of-the-art tradeoff between sparsity and performance. However, without a clear consensus on "how to choose," the layerwise sparsities are mostly selected algorithm-by-algorithm, often resorting to handcrafted heuristics or an extensive hyperparameter search. To fill this gap, we propose a novel importance score for global pruning, coined layer-adaptive magnitude-based pruning (LAMP) score; the score is a rescaled version of weight magnitude that incorporates the model-level $\ell_2$ distortion incurred by pruning, and does not require any hyperparameter tuning or heavy computation. Under various image classification setups, LAMP consistently outperforms popular existing schemes for layerwise sparsity selection. Furthermore, we observe that LAMP continues to outperform baselines even in weight-rewinding setups, while the connectivity-oriented layerwise sparsity (the strongest baseline overall) performs worse than a simple global magnitude-based pruning in this case. Code: https://github.com/jaeho-lee/layer-adaptive-sparsity

下载PDF全文

下载文献需遵守相关版权规定

论文标题