论文标题
Nanopu:重新设计CPU网络接口,以最大程度地减少RPC尾巴延迟
The nanoPU: Redesigning the CPU-Network Interface to Minimize RPC Tail Latency
论文作者
论文摘要
Nanopu是一种新的网络优化的CPU,旨在最大程度地减少RPC的尾部潜伏期。通过绕过缓存和内存层次结构,NanoPu将到达消息直接放入CPU寄存器文件中。通过应用程序的线到线延迟仅为65NS,比当前的最新时间快约13倍。 NanoPu将关键功能从软件移动到硬件:可靠的网络传输,拥堵控制,核心选择和线程调度。它还支持一个独特的功能,以限制高优先级应用所经历的尾巴潜伏期。我们的原型Nanopu基于修改的RISC-V CPU;我们使用AWS FPGA的324个内核的周期精确模拟评估其性能,包括实际应用(云母和链复制)。
The nanoPU is a new networking-optimized CPU designed to minimize tail latency for RPCs. By bypassing the cache and memory hierarchy, the nanoPU directly places arriving messages into the CPU register file. The wire-to-wire latency through the application is just 65ns, about 13x faster than the current state-of-the-art. The nanoPU moves key functions from software to hardware: reliable network transport, congestion control, core selection, and thread scheduling. It also supports a unique feature to bound the tail latency experienced by high-priority applications. Our prototype nanoPU is based on a modified RISC-V CPU; we evaluate its performance using cycle-accurate simulations of 324 cores on AWS FPGAs, including real applications (MICA and chain replication).