论文标题

Python的天气和气候建模的生产性绩效工程

Productive Performance Engineering for Weather and Climate Modeling with Python

论文作者

Ben-Nun, Tal, Groner, Linus, Deconinck, Florian, Wicky, Tobias, Davis, Eddie, Dahm, Johann, Elbert, Oliver D., George, Rhea, McGibbon, Jeremy, Trümper, Lukas, Wu, Elynn, Fuhrer, Oliver, Schulthess, Thomas, Hoefler, Torsten

论文摘要

Earth System模型是通过紧密的耦合来靶向硬件的,通常包含基于处理器特征的专门代码。这种耦合源于使用硬编码计算时间表和布局的命令式语言。我们提出了优化有限体积的立方体动力学核心(FV3)的详细说明,从而提高了生产率和性能。通过使用声明的Python插入模具域特异性语言和以数据为中心的优化,我们抽象了特定于硬件的细节,并定义了半自动化的工作流程,以分析和优化天气和气候应用。工作流程利用本地和完整程序优化以及用户指导的微调。为了修剪不可行的全球优化空间,我们通过新颖的传输调谐方法自动利用重复代码图案。在Piz Daint SuperCuputer上,我们将其扩展到2,400 GPU,在原始代码的一小部分中,在调谐生产实现的情况下,达到3.92倍的加速度。

Earth system models are developed with a tight coupling to target hardware, often containing specialized code predicated on processor characteristics. This coupling stems from using imperative languages that hard-code computation schedules and layout. We present a detailed account of optimizing the Finite Volume Cubed-Sphere Dynamical Core (FV3), improving productivity and performance. By using a declarative Python-embedded stencil domain-specific language and data-centric optimization, we abstract hardware-specific details and define a semi-automated workflow for analyzing and optimizing weather and climate applications. The workflow utilizes both local and full-program optimization, as well as user-guided fine-tuning. To prune the infeasible global optimization space, we automatically utilize repeating code motifs via a novel transfer tuning approach. On the Piz Daint supercomputer, we scale to 2,400 GPUs, achieving speedups of up to 3.92x over the tuned production implementation at a fraction of the original code.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源