学习有效n：m稀疏性的最佳组合

论文标题

学习有效n：m稀疏性的最佳组合

Learning Best Combination for Efficient N:M Sparsity

论文作者

Zhang, Yuxin, Lin, Mingbao, Lin, Zhihang, Luo, Yiting, Li, Ke, Chao, Fei, Wu, Yongjian, Ji, Rongrong

论文摘要

通过强制连续重量的最多n个非零，最近的N：m网络稀疏性引起了越来越多的吸引力的优势：1）在较高的稀疏性方面有希望的表现。 2）对NVIDIA A100 GPU的显着加速。最近的研究需要昂贵的训练阶段或繁重的致密梯度计算。在本文中，我们表明N：M学习可以自然地将其描述为一个组合问题，该问题在有限的集合中寻找最佳组合候选者。以这种特征的激励，我们以有效的分裂方式解决了n：m的稀疏性。首先，我们将权重矢量分为$ c _ {\ text {m}}^{\ text {n}} $组合子集的固定尺寸N。然后，我们通过分配每个可学习的分数来征服组合问题。我们证明，引入的评分机制可以很好地模拟组合子集之间的相对重要性。通过逐渐消除低得分的子集，可以在正常训练阶段有效地优化N：M细粒稀疏性。全面的实验表明，我们的学习最佳组合（LBC）的表现始终如一，始终如一地比现成的N：M稀疏方法更好。我们的项目以\ url {https://github.com/zyxxmu/lbc}发布。

By forcing at most N out of M consecutive weights to be non-zero, the recent N:M network sparsity has received increasing attention for its two attractive advantages: 1) Promising performance at a high sparsity. 2) Significant speedups on NVIDIA A100 GPUs. Recent studies require an expensive pre-training phase or a heavy dense-gradient computation. In this paper, we show that the N:M learning can be naturally characterized as a combinatorial problem which searches for the best combination candidate within a finite collection. Motivated by this characteristic, we solve N:M sparsity in an efficient divide-and-conquer manner. First, we divide the weight vector into $C_{\text{M}}^{\text{N}}$ combination subsets of a fixed size N. Then, we conquer the combinatorial problem by assigning each combination a learnable score that is jointly optimized with its associate weights. We prove that the introduced scoring mechanism can well model the relative importance between combination subsets. And by gradually removing low-scored subsets, N:M fine-grained sparsity can be efficiently optimized during the normal training phase. Comprehensive experiments demonstrate that our learning best combination (LBC) performs consistently better than off-the-shelf N:M sparsity methods across various networks. Our project is released at \url{https://github.com/zyxxmu/LBC}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题