张量加速器上稀疏矩阵乘法的阻止技术

论文标题

张量加速器上稀疏矩阵乘法的阻止技术

Blocking Techniques for Sparse Matrix Multiplication on Tensor Accelerators

论文作者

Labini, Paolo Sylos, Bernaschi, Massimo, Silvestri, Francesco, Vella, Flavio

论文摘要

张量加速器已经获得了知名度，因为它们提供了一种廉价，有效的解决方案，可在深度学习中加快计算实用的任务，以及最近在其他科学计算应用程序中。但是，由于其特征是专门为张量代数（通常是密集的矩阵产品）设计的，因此通常认为它们不适合具有稀疏数据的应用。为了挑战这个观点，我们讨论了在此类体系结构上加速稀疏矩阵乘法的方法和解决方案。特别是，我们提出了一种一维阻止算法，并在密度上具有理论保证，从而从任意的稀疏矩阵中构建了密集的块。实验结果表明，即使对于非结构化且高度的矩阵，我们利用NVIDIA张量的核心的基于块的解决方案也比其稀疏对应物快。我们观察到在现实世界中稀疏矩阵上最多两个数量级的速度显着。

Tensor accelerators have gained popularity because they provide a cheap and efficient solution for speeding up computational-expensive tasks in Deep Learning and, more recently, in other Scientific Computing applications. However, since their features are specifically designed for tensor algebra (typically dense matrix-product), it is commonly assumed that they are not suitable for applications with sparse data. To challenge this viewpoint, we discuss methods and present solutions for accelerating sparse matrix multiplication on such architectures. In particular, we present a 1-dimensional blocking algorithm with theoretical guarantees on the density, which builds dense blocks from arbitrary sparse matrices. Experimental results show that, even for unstructured and highly-sparse matrices, our block-based solution which exploits Nvidia Tensor Cores is faster than its sparse counterpart. We observed significant speed-ups of up to two orders of magnitude on real-world sparse matrices.

下载PDF全文

下载文献需遵守相关版权规定

论文标题