用于网络内聚合的有效数据平面内存计划

论文标题

用于网络内聚合的有效数据平面内存计划

Efficient Data-Plane Memory Scheduling for In-Network Aggregation

论文作者

Wang, Hao, Qin, Yuxuan, Lao, ChonLam, Le, Yanfang, Wu, Wenfei, Chen, Kai

论文摘要

随着分布式培训的规模的增长，沟通变成了瓶颈。为了加速通信，最近的作品引入了网络内聚合（INA），将梯度求和到网络中间框中，例如可编程开关以减少流量量。但是，与分布式训练中传递的梯度的量相比，开关内存稀缺。尽管文献应用了基于池的流或动态共享等方法来应对不匹配，但开关内存仍然是潜在的性能瓶颈。此外，由于在最近的工作中聚集器deadlocation的同步要求，我们观察到开关内存的利用不足。为了改善开关内存利用率，我们建议ESA，$ \ usewinline {e} $ fficient switch内存$ \ usewessline {s} $ cheduler for-network $ \ usevenline {a} $ ggregation。在其内核上，ESA强制执行先发制的聚合分配原始分配，并在数据平面上介绍优先级计划，从而改善了开关内存的利用率和平均作业完成时间（JCT）。实验表明，ESA可以将平均JCT提高到$ 1.35 \ times $。

As the scale of distributed training grows, communication becomes a bottleneck. To accelerate the communication, recent works introduce In-Network Aggregation (INA), which moves the gradients summation into network middle-boxes, e.g., programmable switches to reduce the traffic volume. However, switch memory is scarce compared to the volume of gradients transmitted in distributed training. Although literature applies methods like pool-based streaming or dynamic sharing to tackle the mismatch, switch memory is still a potential performance bottleneck. Furthermore, we observe the under-utilization of switch memory due to the synchronization requirement for aggregator deallocation in recent works. To improve the switch memory utilization, we propose ESA, an $\underline{E}$fficient Switch Memory $\underline{S}$cheduler for In-Network $\underline{A}$ggregation. At its cores, ESA enforces the preemptive aggregator allocation primitive and introduces priority scheduling at the data-plane, which improves the switch memory utilization and average job completion time (JCT). Experiments show that ESA can improve the average JCT by up to $1.35\times$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题