论文标题
动态张量重新布置
Dynamic Tensor Rematerialization
论文作者
论文摘要
检查点可以通过将中间激活从内存中解放出来并按需重新计算在受限的内存预算下对深度学习模型进行培训。当前检查点技术在静态上计划这些重新计算,并假设静态计算图。我们证明,一种简单的在线算法可以通过引入动态张量重新布置(DTR)来实现可比性的性能,这是一种可扩展和一般性检查点的贪婪的在线算法,通过驱逐策略进行了参数化,并支持动态模型。我们证明,DTR可以在$ω(\ sqrt {n})上训练$ n $ layer线性馈电网络,只有$ \ mathcal {o}(o}(n)$ tensor操作。 DTR与模拟实验中最佳静态检查点的性能非常匹配。我们仅通过插入张量分配和操作员调用并收集张量的轻量级元数据就将DTR原型纳入Pytorch。
Checkpointing enables the training of deep learning models under restricted memory budgets by freeing intermediate activations from memory and recomputing them on demand. Current checkpointing techniques statically plan these recomputations offline and assume static computation graphs. We demonstrate that a simple online algorithm can achieve comparable performance by introducing Dynamic Tensor Rematerialization (DTR), a greedy online algorithm for checkpointing that is extensible and general, is parameterized by eviction policy, and supports dynamic models. We prove that DTR can train an $N$-layer linear feedforward network on an $Ω(\sqrt{N})$ memory budget with only $\mathcal{O}(N)$ tensor operations. DTR closely matches the performance of optimal static checkpointing in simulated experiments. We incorporate a DTR prototype into PyTorch merely by interposing on tensor allocations and operator calls and collecting lightweight metadata on tensors.