论文标题
在大规模特征值计算中平行性的正交层
Orthogonal layers of parallelism in large-scale eigenvalue computations
论文作者
论文摘要
我们以滤波器对角线化为例,在大规模的eigensolvers的背景下,在分布式稀疏矩阵 - (多个) - 矢量乘法的开销上。我们研究的基础是一个绩效模型,其中包括一个通信指标,该指标是直接从矩阵稀疏模式计算的,而无需运行任何代码。绩效模型量化了由于交流开销而丢失的程度可扩展性和并行效率。 为了恢复可伸缩性,我们在滤波器对角网技术中识别两个并行性的正交层。在水平层中,稀疏基质的行分布在各个过程中。在多向量的垂直层束中,分布在单独的过程组之间。关于通信度量的分析可以预测,只有当一个通过不同的分布式向量布局实现并行性的两个正交层时,才能恢复可伸缩性。 我们的理论分析通过基准测试了量子和固态物理,道路网络和非线性编程的应用矩阵的基准。最终,我们证明了使用两个模范应用程序的正交层(一种示例性的应用程序)(一种激子和强度相关的电子系统)的好处,该案例遇到了小型或大型通信的开销。
We address the communication overhead of distributed sparse matrix-(multiple)-vector multiplication in the context of large-scale eigensolvers, using filter diagonalization as an example. The basis of our study is a performance model which includes a communication metric that is computed directly from the matrix sparsity pattern without running any code. The performance model quantifies to which extent scalability and parallel efficiency are lost due to communication overhead. To restore scalability, we identify two orthogonal layers of parallelism in the filter diagonalization technique. In the horizontal layer the rows of the sparse matrix are distributed across individual processes. In the vertical layer bundles of multiple vectors are distributed across separate process groups. An analysis in terms of the communication metric predicts that scalability can be restored if, and only if, one implements the two orthogonal layers of parallelism via different distributed vector layouts. Our theoretical analysis is corroborated by benchmarks for application matrices from quantum and solid state physics, road networks, and nonlinear programming. We finally demonstrate the benefits of using orthogonal layers of parallelism with two exemplary application cases -- an exciton and a strongly correlated electron system -- which incur either small or large communication overhead.