论文标题

高含量:动态高参数调整,以有效地分布DNN训练在异质系统上

HyperTune: Dynamic Hyperparameter Tuning For Efficient Distribution of DNN Training Over Heterogeneous Systems

论文作者

HeydariGorji, Ali, Rezaei, Siavash, Torabzadehkashi, Mahdi, Bobarshad, Hossein, Alves, Vladimir, Chou, Pai H.

论文摘要

分布式培训是一种加速深度神经网络(DNN)培训的新方法,但是常见的培训库缺乏解决分布式案例的异质处理器或处理节点被其他工作负载中断的情况。本文介绍了在计算存储设备(CSD)上对DNN进行的分布式培训,该培训是基于NAND的基于NAND的高容量数据存储,并带有内部处理引擎。基于CSD的分布式体系结构通过消除存储设备与主机处理器之间的不必要的数据移动,从而在性能可伸缩性,弹性和数据隐私方面结合了联合学习的优势。该论文还描述了Stannis,这是一个DNN培训框架,通过动态调整异质系统中的训练超聚合物的影响,改善了现有分布式培训框架的缺点,以保持每秒加工图像期限的最大总体处理速度和能源效率。图像分类训练基准测试基准的实验结果显示,使用Stannis Plus Plus CSD与通用系统相比,性能提高了3.1倍,能源消耗降低了2.45倍。

Distributed training is a novel approach to accelerate Deep Neural Networks (DNN) training, but common training libraries fall short of addressing the distributed cases with heterogeneous processors or the cases where the processing nodes get interrupted by other workloads. This paper describes distributed training of DNN on computational storage devices (CSD), which are NAND flash-based, high capacity data storage with internal processing engines. A CSD-based distributed architecture incorporates the advantages of federated learning in terms of performance scalability, resiliency, and data privacy by eliminating the unnecessary data movement between the storage device and the host processor. The paper also describes Stannis, a DNN training framework that improves on the shortcomings of existing distributed training frameworks by dynamically tuning the training hyperparameters in heterogeneous systems to maintain the maximum overall processing speed in term of processed images per second and energy efficiency. Experimental results on image classification training benchmarks show up to 3.1x improvement in performance and 2.45x reduction in energy consumption when using Stannis plus CSD compare to the generic systems.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源