数据并行SGD的自适应梯度量化

论文标题

数据并行SGD的自适应梯度量化

Adaptive Gradient Quantization for Data-Parallel SGD

论文作者

Faghri, Fartash, Tabrizian, Iman, Markov, Ilia, Alistarh, Dan, Roy, Daniel, Ramezani-Kebrya, Ali

论文摘要

SGD使用梯度量化方案的许多通信效率变体。这些方案通常是启发式的，并且在培训过程中是固定的。我们从经验上观察到，深层模型梯度的统计数据在培训过程中发生了变化。在这一观察过程中，我们引入了两个自适应量化方案ALQ和AMQ。在这两个方案中，处理器都通过有效计算参数分布的足够统计数据并行更新其压缩方案。在挑战低成本的沟通设置中，我们将CIFAR-10的验证精度提高了几乎2％，ImageNet的验证精度在Imagenet上提高了1％。我们的自适应方法对于选择超参数的选择也更加强大。

Many communication-efficient variants of SGD use gradient quantization schemes. These schemes are often heuristic and fixed over the course of training. We empirically observe that the statistics of gradients of deep models change during the training. Motivated by this observation, we introduce two adaptive quantization schemes, ALQ and AMQ. In both schemes, processors update their compression schemes in parallel by efficiently computing sufficient statistics of a parametric distribution. We improve the validation accuracy by almost 2% on CIFAR-10 and 1% on ImageNet in challenging low-cost communication setups. Our adaptive methods are also significantly more robust to the choice of hyperparameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题