论文标题
低精度随机梯度Langevin动力学
Low-Precision Stochastic Gradient Langevin Dynamics
论文作者
论文摘要
尽管低精度优化已被广泛用于加速深度学习,但低精度抽样仍未得到探索。结果,尽管在许多大规模的情况下,采样是不可行的,尽管对神经网络的概括和不确定性估计给予了显着的好处。在本文中,我们提供了低精确的随机梯度Langevin Dynamics(SGLD)的首次研究,这表明其成本可以大大降低而无需牺牲性能,因为它的内在能力处理了系统噪声。我们证明,低精度SGLD与完全精确梯度累加器的收敛性比在强凸设置中的SGD对应物的量化误差的影响较小。为了进一步启用低精度梯度蓄能器,我们为SGLD开发了一个新的量化功能,该功能保留了每个更新步骤中的差异。我们证明,低精确的SGLD与完整精确的SGLD相当,只有8位在各种深度学习任务上。
While low-precision optimization has been widely used to accelerate deep learning, low-precision sampling remains largely unexplored. As a consequence, sampling is simply infeasible in many large-scale scenarios, despite providing remarkable benefits to generalization and uncertainty estimation for neural networks. In this paper, we provide the first study of low-precision Stochastic Gradient Langevin Dynamics (SGLD), showing that its costs can be significantly reduced without sacrificing performance, due to its intrinsic ability to handle system noise. We prove that the convergence of low-precision SGLD with full-precision gradient accumulators is less affected by the quantization error than its SGD counterpart in the strongly convex setting. To further enable low-precision gradient accumulators, we develop a new quantization function for SGLD that preserves the variance in each update step. We demonstrate that low-precision SGLD achieves comparable performance to full-precision SGLD with only 8 bits on a variety of deep learning tasks.