论文标题
参数服务器中的动态参数分配
Dynamic Parameter Allocation in Parameter Servers
论文作者
论文摘要
为了跟上增加数据集大小和模型复杂性,分布式培训已成为大型机器学习任务的必要性。参数服务器简化了分布式参数管理的实施---分布式培训的关键问题---但可以引起严重的沟通开销。为了减少沟通开销,分布式机器学习算法使用技术来增加参数访问区域(PAL),从而达到线性加速。我们发现现有的参数服务器仅提供对PAL技术的支持有限的支持,因此可以防止有效的培训。在本文中,我们探讨了是否以及在何种程度上可以支持PAL技术,以及这种支持是否有益。我们建议将动态参数分配整合到参数服务器中,描述称为Lapse的该参数服务器的有效实现,并在许多机器学习任务中将其性能与现有参数服务器进行比较。我们发现,Lapse提供了接近线性的缩放,并且比现有参数服务器的数量级可以快。
To keep up with increasing dataset sizes and model complexity, distributed training has become a necessity for large machine learning tasks. Parameter servers ease the implementation of distributed parameter management---a key concern in distributed training---, but can induce severe communication overhead. To reduce communication overhead, distributed machine learning algorithms use techniques to increase parameter access locality (PAL), achieving up to linear speed-ups. We found that existing parameter servers provide only limited support for PAL techniques, however, and therefore prevent efficient training. In this paper, we explore whether and to what extent PAL techniques can be supported, and whether such support is beneficial. We propose to integrate dynamic parameter allocation into parameter servers, describe an efficient implementation of such a parameter server called Lapse, and experimentally compare its performance to existing parameter servers across a number of machine learning tasks. We found that Lapse provides near-linear scaling and can be orders of magnitude faster than existing parameter servers.