Nanopu：重新设计CPU网络接口，以最大程度地减少RPC尾巴延迟

论文标题

Nanopu：重新设计CPU网络接口，以最大程度地减少RPC尾巴延迟

The nanoPU: Redesigning the CPU-Network Interface to Minimize RPC Tail Latency

论文作者

Ibanez, Stephen, Mallery, Alex, Arslan, Serhat, Jepsen, Theo, Shahbaz, Muhammad, McKeown, Nick, Kim, Changhoon

论文摘要

Nanopu是一种新的网络优化的CPU，旨在最大程度地减少RPC的尾部潜伏期。通过绕过缓存和内存层次结构，NanoPu将到达消息直接放入CPU寄存器文件中。通过应用程序的线到线延迟仅为65NS，比当前的最新时间快约13倍。 NanoPu将关键功能从软件移动到硬件：可靠的网络传输，拥堵控制，核心选择和线程调度。它还支持一个独特的功能，以限制高优先级应用所经历的尾巴潜伏期。我们的原型Nanopu基于修改的RISC-V CPU；我们使用AWS FPGA的324个内核的周期精确模拟评估其性能，包括实际应用（云母和链复制）。

The nanoPU is a new networking-optimized CPU designed to minimize tail latency for RPCs. By bypassing the cache and memory hierarchy, the nanoPU directly places arriving messages into the CPU register file. The wire-to-wire latency through the application is just 65ns, about 13x faster than the current state-of-the-art. The nanoPU moves key functions from software to hardware: reliable network transport, congestion control, core selection, and thread scheduling. It also supports a unique feature to bound the tail latency experienced by high-priority applications. Our prototype nanoPU is based on a modified RISC-V CPU; we evaluate its performance using cycle-accurate simulations of 324 cores on AWS FPGAs, including real applications (MICA and chain replication).

下载PDF全文

下载文献需遵守相关版权规定

论文标题