论文标题
晶圆尺度处理器上的快速模具代码计算
Fast Stencil-Code Computation on a Wafer-Scale Processor
论文作者
论文摘要
对于PDE代码,基于CPU和基于GPU的系统的性能通常很低,PDE代码必须求解大型,稀疏且经常结构化的线性方程式系统。迭代求解器受缓存和内存之间以及节点之间的数据移动的限制。在这里,我们描述了这种方程系统在小脑系统CS-1上的解决方案,这是一种晶圆尺度处理器,具有内存带宽和通信延迟以表现良好。我们在单个晶圆尺度系统上通过BICGSTAB实现了0.86 Pflops的线性系统,该系统是由600 x 595 x 1536网格上的7分有限差模板产生的,可实现机器峰值性能的三分之一。我们解释了系统,其体系结构和编程及其在此问题和相关问题上的性能。我们讨论记忆容量和浮点精度的问题。我们概述了将这项工作扩展到完整应用程序的计划。
The performance of CPU-based and GPU-based systems is often low for PDE codes, where large, sparse, and often structured systems of linear equations must be solved. Iterative solvers are limited by data movement, both between caches and memory and between nodes. Here we describe the solution of such systems of equations on the Cerebras Systems CS-1, a wafer-scale processor that has the memory bandwidth and communication latency to perform well. We achieve 0.86 PFLOPS on a single wafer-scale system for the solution by BiCGStab of a linear system arising from a 7-point finite difference stencil on a 600 X 595 X 1536 mesh, achieving about one third of the machine's peak performance. We explain the system, its architecture and programming, and its performance on this problem and related problems. We discuss issues of memory capacity and floating point precision. We outline plans to extend this work towards full applications.