论文标题

Lipschitz的SGD变体连续损失功能在低精度环境中

Variants of SGD for Lipschitz Continuous Loss Functions in Low-Precision Environments

论文作者

Metel, Michael R.

论文摘要

这项工作是由低精度算术环境中的神经网络训练的动机,研究了使用自适应步骤尺寸和计算误差研究SGD变体的收敛。考虑到一般的随机Lipschitz连续损耗函数,证明了渐近的融合结果,也证明了与Clarke固定点的渐近收敛结果,以及非反应收敛到近似固定点。假设只有在计算SGD步骤本身时,只能计算出损失函数随机梯度的近似值。 SGD的不同变体经过经验测试,在两个图像识别任务中,与SGD相比,观察到改进的测试集精度。

Motivated by neural network training in low-precision arithmetic environments, this work studies the convergence of variants of SGD using adaptive step sizes with computational error. Considering a general stochastic Lipschitz continuous loss function, an asymptotic convergence result to a Clarke stationary point is proven as well as the non-asymptotic convergence to an approximate stationary point. It is assumed that only an approximation of the loss function's stochastic gradient can be computed in addition to error in computing the SGD step itself. Different variants of SGD are tested empirically, where improved test set accuracy is observed compared to SGD for two image recognition tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源