论文标题

Manticore:4096核RISC-V chiplet架构,用于超高浮点计算

Manticore: A 4096-core RISC-V Chiplet Architecture for Ultra-efficient Floating-point Computing

论文作者

Zaruba, Florian, Schuiki, Fabian, Benini, Luca

论文摘要

数据并行问题需要在紧密的面积和能源效率限制下每秒增加浮点(FP)操作。在这项工作中,我们提出了Manticore,这是一种通用,超高效率的基于chiplet的架构,用于数据并行FP工作负载。与CPU和GPU相比,我们已经在GlobalFoundries 22FDX过程中制造了Chiplet的计算核心原型,并在FP密集型工作量上显示了5倍以上的能源效率。高能量和面积效率下的计算能力由包含八个小整数芯的单打簇提供,每个核心控制大型FPU。核心支持两个自定义ISA扩展:SSR扩展Elides明确的负载并通过将其编码为寄存器读取和写入来存储指令。 FREP扩展使整数核心与FPU脱离,从而允许独立发出浮点指令。这两个扩展使单发核心可以最大程度地减少其指令的获取带宽并饱和FPU的指令带宽,从而达到90%以上的FPU利用率,超过40%的核心区域专用于FPU。

Data-parallel problems demand ever growing floating-point (FP) operations per second under tight area- and energy-efficiency constraints. In this work, we present Manticore, a general-purpose, ultra-efficient chiplet-based architecture for data-parallel FP workloads. We have manufactured a prototype of the chiplet's computational core in Globalfoundries 22FDX process and demonstrate more than 5x improvement in energy efficiency on FP intensive workloads compared to CPUs and GPUs. The compute capability at high energy and area efficiency is provided by Snitch clusters containing eight small integer cores, each controlling a large FPU. The core supports two custom ISA extensions: The SSR extension elides explicit load and store instructions by encoding them as register reads and writes. The FREP extension decouples the integer core from the FPU allowing floating-point instructions to be issued independently. These two extensions allow the single-issue core to minimize its instruction fetch bandwidth and saturate the instruction bandwidth of the FPU, achieving FPU utilization above 90%, with more than 40% of core area dedicated to the FPU.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源