分布式内存DMRG通过稀疏和密集的并行张量收缩

论文标题

分布式内存DMRG通过稀疏和密集的并行张量收缩

Distributed-Memory DMRG via Sparse and Dense Parallel Tensor Contractions

论文作者

Levy, Ryan, Solomonik, Edgar, Clark, Bryan K.

论文摘要

密度矩阵重新归一化组（DMRG）算法是解决特征值问题以建模量子系统的强大工具。 DMRG依靠张量收缩和致密的线性代数来计算凝结物理系统的特性。但是，由于有限的并发性，大记忆足迹和张量的稀疏性，其有效的并行实施是具有挑战性的。我们通过实现两种新的并行方法来缓解这些问题，这些方法可以通过Cyclops（一个分布式内存张量收缩库）通过独眼巨人（Cyclops）产生的块稀疏性。我们使用Blue Waters和Stampede2超级计算机在两个物理系统上进行基准测试。在大致可比的计算资源使用时，我们的DMRG性能在运行时最多可提高5.9倍，而iTensor的处理速率则高达99倍。这可以通过较大的张量来实现更高的精度计算，以实现量子状态近似。我们证明，尽管并发有限，但使用有效的平行张量收缩机制，DMRG弱可扩展。

The Density Matrix Renormalization Group (DMRG) algorithm is a powerful tool for solving eigenvalue problems to model quantum systems. DMRG relies on tensor contractions and dense linear algebra to compute properties of condensed matter physics systems. However, its efficient parallel implementation is challenging due to limited concurrency, large memory footprint, and tensor sparsity. We mitigate these problems by implementing two new parallel approaches that handle block sparsity arising in DMRG, via Cyclops, a distributed memory tensor contraction library. We benchmark their performance on two physical systems using the Blue Waters and Stampede2 supercomputers. Our DMRG performance is improved by up to 5.9X in runtime and 99X in processing rate over ITensor, at roughly comparable computational resource use. This enables higher accuracy calculations via larger tensors for quantum state approximation. We demonstrate that despite having limited concurrency, DMRG is weakly scalable with the use of efficient parallel tensor contraction mechanisms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题