流入线：超级和系统参数调整深度学习群集的管道并行性

论文标题

流入线：超级和系统参数调整深度学习群集的管道并行性

PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep Learning Clusters

论文作者

Rocha, Isabelly, Morris, Nathaniel, Chen, Lydia Y., Felber, Pascal, Birke, Robert, Schiavoni, Valerio

论文摘要

由于AI驱动的服务（例如机器翻译和图像识别）的进步，DNN学习工作在当今的集群中很常见。这些工作中最关键的阶段用于模型性能和学习成本，是对超参数的调整。现有方法利用诸如早期停止标准之类的技术来减少对学习成本的调整影响。但是，这些策略并未考虑某些超参数和系统参数对培训时间的影响。本文介绍了Pipetune，这是DNN学习工作的框架，该框架解决了这两种参数之间的权衡。 Pipetune利用了此类作业的高平行性和经常性特征，以通过管道的同时调整超级和系统参数来最大程度地降低学习成本。我们使用三种不同类型的工作负载的实验评估表明，在调音和训练时间上，PioteTune分别降低了22.6％和1.7倍的速度。 Pipetune不仅可以提高性能，还可以降低能源消耗高达29％。

DNN learning jobs are common in today's clusters due to the advances in AI driven services such as machine translation and image recognition. The most critical phase of these jobs for model performance and learning cost is the tuning of hyperparameters. Existing approaches make use of techniques such as early stopping criteria to reduce the tuning impact on learning cost. However, these strategies do not consider the impact that certain hyperparameters and systems parameters have on training time. This paper presents PipeTune, a framework for DNN learning jobs that addresses the trade-offs between these two types of parameters. PipeTune takes advantage of the high parallelism and recurring characteristics of such jobs to minimize the learning cost via a pipelined simultaneous tuning of both hyper and system parameters. Our experimental evaluation using three different types of workloads indicates that PipeTune achieves up to 22.6% reduction and 1.7x speed up on tuning and training time, respectively. PipeTune not only improves performance but also lowers energy consumption up to 29%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题