论文标题

GPU加速的Barycentric Lagrange Treecode

A GPU-Accelerated Barycentric Lagrange Treecode

论文作者

Vaughn, Nathan, Wilson, Leighton, Krasny, Robert

论文摘要

我们提出了无独立的Barycentric Lagrange Treecode(BLTC)的MPI + OpenACC实现,以快速求和GPU上的粒子相互作用。分布式存储器并行化使用递归坐标归构域分解和MPI远程内存访问,以在每个等级上构建本地必需树。将粒子相互作用组织到目标批处理/源群集相互作用中,这些相互作用有效地映射到GPU上。目标批处理提供了平行性的外部级别,而Barycentric粒子群集近似的直接总和形式则提供了并行性的内部水平。 GPU加速的BLTC性能在多个通过库仑电势和Yukawa势相互作用的测试用例中证明了。

We present an MPI + OpenACC implementation of the kernel-independent barycentric Lagrange treecode (BLTC) for fast summation of particle interactions on GPUs. The distributed memory parallelization uses recursive coordinate bisection for domain decomposition and MPI remote memory access to build locally essential trees on each rank. The particle interactions are organized into target batch/source cluster interactions which efficiently map onto the GPU; target batching provides an outer level of parallelism, while the direct sum form of the barycentric particle-cluster approximation provides an inner level of parallelism. The GPU-accelerated BLTC performance is demonstrated on several test cases up to 1~billion particles interacting via the Coulomb potential and Yukawa potential.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源