通过切片提高计算效能前沿

论文标题

通过切片提高计算效能前沿

Improving compute efficacy frontiers with SliceOut

论文作者

Notin, Pascal, Gomez, Aidan N., Yoo, Joanna, Gal, Yarin

论文摘要

在深度学习中推动计算功效前沿对需要频繁重新训练或工作量的任务至关重要，这些任务需要培训大量型号。我们介绍了Sliceout - 一种以辍学启发的计划，旨在利用GPU内存布局，以更快地训练深度学习模型而不会影响最终测试精度。通过随机丢弃连续的单元集，我们的方法通过（1）快速内存访问和矩阵乘法较小张量来实现训练速度，以及（2）存储器节省，通过避免在重量梯度和激活中分配内存为零单位来节省。在测试时，关闭切片会在保持测试准确性的线性架构上执行隐式结合。我们展示了10-40％的加速和记忆减少，并使用宽的重置，有效网络和变压器模型，精度的损失最小。这会导致整体上更快地处理大型计算工作负载，并大大减少所得能源消耗和二氧化碳。

Pushing forward the compute efficacy frontier in deep learning is critical for tasks that require frequent model re-training or workloads that entail training a large number of models. We introduce SliceOut -- a dropout-inspired scheme designed to take advantage of GPU memory layout to train deep learning models faster without impacting final test accuracy. By dropping contiguous sets of units at random, our method realises training speedups through (1) fast memory access and matrix multiplication of smaller tensors, and (2) memory savings by avoiding allocating memory to zero units in weight gradients and activations. At test time, turning off SliceOut performs an implicit ensembling across a linear number of architectures that preserves test accuracy. We demonstrate 10-40% speedups and memory reduction with Wide ResNets, EfficientNets, and Transformer models, with minimal to no loss in accuracy. This leads to faster processing of large computational workloads overall, and significantly reduce the resulting energy consumption and CO2emissions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题