论文标题
在对称直线矩阵分区上
On Symmetric Rectilinear Matrix Partitioning
论文作者
论文摘要
即使分配不规则的工作量到处理单元,对于许多应用中的有效并行化也至关重要。在这项工作中,我们关注的是一种空间分区,称为稀疏矩阵的直线分区(也称为广义块分布)。更具体地说,在这项工作中,我们解决了平方矩阵对称直线分区的问题。通过对称,我们指的是矩阵的行和列相同的分区,产生了一个瓷砖,其中对角瓷砖(块)将是正方形。我们首先表明,解决此问题的最佳解决方案是NP-固定,我们提出了四个启发式方法来解决此问题的两个不同变体。我们对提出的启发式方法的计算复杂性进行了详尽的分析。为了使所提出的技术更适用于现实生活申请的情况,我们通过利用有效的稀疏策略以及有效的稀疏前缀和数据结构来进一步降低其计算复杂性。我们通过实验表明,所提出的算法在超过600个测试矩阵上是有效的。随着稀疏性,我们的方法在现代24核心系统上的Twitter图中需要不到3秒钟,并输出了一个解决方案,该解决方案的负载不平衡不超过1%。
Even distribution of irregular workload to processing units is crucial for efficient parallelization in many applications. In this work, we are concerned with a spatial partitioning called rectilinear partitioning (also known as generalized block distribution) of sparse matrices. More specifically, in this work, we address the problem of symmetric rectilinear partitioning of a square matrix. By symmetric, we mean the rows and columns of the matrix are identically partitioned yielding a tiling where the diagonal tiles (blocks) will be squares. We first show that the optimal solution to this problem is NP-hard, and we propose four heuristics to solve two different variants of this problem. We present a thorough analysis of the computational complexities of those proposed heuristics. To make the proposed techniques more applicable in real life application scenarios, we further reduce their computational complexities by utilizing effective sparsification strategies together with an efficient sparse prefix-sum data structure. We experimentally show the proposed algorithms are efficient and effective on more than six hundred test matrices. With sparsification, our methods take less than 3 seconds in the Twitter graph on a modern 24 core system and output a solution whose load imbalance is no worse than 1%.