论文标题

关于检查点的有效结构

On Efficient Constructions of Checkpoints

论文作者

Chen, Yu, Liu, Zhenming, Ren, Bin, Jin, Xin

论文摘要

有效的检查点/快照是培训和诊断深度学习模型的关键工具。在本文中,我们为检查点构造(称为LC-Checkpoint)提出了有损的压缩方案。在假设SGD用于训练模型的情况下,LC-Chackpoint同时最大化压缩率并优化了恢复速度。 LC-CheckPoint进行了量化和优先级促进,以存储SGD恢复的最关键信息,然后使用Huffman编码来利用梯度尺度的不均匀分布。我们的广泛实验表明,LC-Checkpoint的压缩率最高为$ 28 \ times $,而恢复速度高达$ 5.77 \ times $ $ $ $ $ $ $ $ $ $ $ $ $ $ $在最先进的算法(疤痕)上。

Efficient construction of checkpoints/snapshots is a critical tool for training and diagnosing deep learning models. In this paper, we propose a lossy compression scheme for checkpoint constructions (called LC-Checkpoint). LC-Checkpoint simultaneously maximizes the compression rate and optimizes the recovery speed, under the assumption that SGD is used to train the model. LC-Checkpointuses quantization and priority promotion to store the most crucial information for SGD to recover, and then uses a Huffman coding to leverage the non-uniform distribution of the gradient scales. Our extensive experiments show that LC-Checkpoint achieves a compression rate up to $28\times$ and recovery speedup up to $5.77\times$ over a state-of-the-art algorithm (SCAR).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源