论文标题
使用高阶有限差异方法加速CFD模拟现代GPU簇的曲线坐标
Accelerating CFD simulation with high order finite difference method on curvilinear coordinates for modern GPU clusters
论文作者
论文摘要
对于高雷诺数($ re $)流量的复杂几何形状的高保真流仿真仍然非常具有挑战性,这需要HPC系统的更强大的计算能力。但是,由于其高功耗和技术困难,HPC使用传统CPU体系结构的开发遭受了瓶颈。将异构体系结构计算提高为HPC开发困难的有前途的解决方案。 GPU加速技术已以低阶方案的结构化网格和高阶方案求解器在非结构化网格上使用。结构化网格的高阶有限差异方法具有许多优势,例如但是,高效率,鲁棒性和低存储空间,但是,高阶差异方案的点之间的强大依赖性仍然限制了其在GPU平台上的应用。在目前的工作中,我们建议一组硬件感知技术,以优化CPU和GPU之间的数据传输效率,以及GPU之间的通信效率。曲线坐标上具有高阶差异方法的内部多块结构化CFD求解器已移植到GPU平台上,并在单个CPU核心上以2000倍的速度获得令人满意的性能。这项工作提供了有效的解决方案,以在当前GPU异质计算机上使用某些高阶有限差异方法将GPU计算应用于CFD模拟中。该测试表明,对于不同的GPU,可以实现明显的加速效应。
A high fidelity flow simulation for complex geometries for high Reynolds number ($Re$) flow is still very challenging, which requires more powerful computational capability of HPC system. However, the development of HPC with traditional CPU architecture suffers bottlenecks due to its high power consumption and technical difficulties. Heterogeneous architecture computation is raised to be a promising solution of difficulties of HPC development. GPU accelerating technology has been utilized in low order scheme CFD solvers on structured grid and high order scheme solvers on unstructured meshes. The high order finite difference methods on structured grid possess many advantages, e.g. high efficiency, robustness and low storage, however, the strong dependence among points for a high order finite difference scheme still limits its application on GPU platform. In present work, we propose a set of hardware-aware technology to optimize the efficiency of data transfer between CPU and GPU, and efficiency of communication between GPUs. An in-house multi-block structured CFD solver with high order finite difference methods on curvilinear coordinates is ported onto GPU platform, and obtain satisfying performance with speedup maximum around 2000x over a single CPU core. This work provides efficient solution to apply GPU computing in CFD simulation with certain high order finite difference methods on current GPU heterogeneous computers. The test shows that significant accelerating effects can been achieved for different GPUs.