论文标题

津贴:迭代内存的GPU应用程序的局部优化执行模型

PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications

论文作者

Zhang, Lingqi, Wahib, Mohamed, Chen, Peng, Meng, Jintao, Wang, Xiao, Endo, Toshio, Matsuoka, Satoshi

论文摘要

HPC代码中通常发生迭代记忆遇到的求解器。典型的GPU实现在主机端具有一个循环,该循环与时间/算法步骤一样多地调用GPU内核。每个内核的终止都隐含地行事了在每次时间步骤推进解决方案后所需的障碍。我们为运行内存的迭代GPU内核提出了一个执行模型:持久内核(Perks)。在此模型中,时间循环在持续内核内移动,并将整个设备的屏障用于同步。然后,我们通过在未使用的寄存器和共享内存中的每个时间步中缓存输出子集来减少设备内存的流量。津贴可以推广到任何迭代求解器:它们在很大程度上独立于求解器的实施。我们解释了特权的设计原理,并证明了津贴的有效性,用于多种迭代2D/3D模板基准(与2D模板的Geomean速度为$ 2.12 $ x,而3D模型的$ 1.24 $ x在出发库中的库)和krylov conjugate fromants speedup speepm speede frompect frompectient $ 4.46 $4。套房和$ 1.43 $ x的较大SPMV数据集的图书馆。所有基于PERKS的实现都可提供:https://github.com/neozhang307/perks。

Iterative memory-bound solvers commonly occur in HPC codes. Typical GPU implementations have a loop on the host side that invokes the GPU kernel as much as time/algorithm steps there are. The termination of each kernel implicitly acts the barrier required after advancing the solution every time step. We propose an execution model for running memory-bound iterative GPU kernels: PERsistent KernelS (PERKS). In this model, the time loop is moved inside persistent kernel, and device-wide barriers are used for synchronization. We then reduce the traffic to device memory by caching subset of the output in each time step in the unused registers and shared memory. PERKS can be generalized to any iterative solver: they largely independent of the solver's implementation. We explain the design principle of PERKS and demonstrate effectiveness of PERKS for a wide range of iterative 2D/3D stencil benchmarks (geomean speedup of $2.12$x for 2D stencils and $1.24$x for 3D stencils over state-of-art libraries), and a Krylov subspace conjugate gradient solver (geomean speedup of $4.86$x in smaller SpMV datasets from SuiteSparse and $1.43$x in larger SpMV datasets over a state-of-art library). All PERKS-based implementations available at: https://github.com/neozhang307/PERKS.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源