梯度校正超出梯度下降

论文标题

梯度校正超出梯度下降

Gradient Correction beyond Gradient Descent

论文作者

Li, Zefan, Ni, Bingbing, Li, Teng, Zhang, WenJun, Gao, Wen

论文摘要

神经网络取得的巨大成功与梯度偏生（GD）算法的应用是密不可分的。基于GD，已经出现了许多变体算法以改善GD优化过程。反向传播的梯度显然是训练神经网络的最关键方面。计算梯度的质量可能会受到多个方面的影响，例如嘈杂的数据，计算误差，算法限制等。为了揭示梯度下降以外的梯度信息，我们引入了一个框架（\ textbf {gcgd}）以执行梯度校正。 GCGD由两个插件模块组成：1）灵感来自梯度预测的概念，我们提出了一个\ textbf {gc-w}模块，用于重量梯度校正； 2）基于神经ode，我们为隐藏状态渐变校正提出了一个\ textbf {gc-ode}模块。实验结果表明，我们的梯度校正框架可以有效地提高梯度质量，以将训练时期降低$ \ sim $ 20 \％，也可以改善网络性能。

The great success neural networks have achieved is inseparable from the application of gradient-descent (GD) algorithms. Based on GD, many variant algorithms have emerged to improve the GD optimization process. The gradient for back-propagation is apparently the most crucial aspect for the training of a neural network. The quality of the calculated gradient can be affected by multiple aspects, e.g., noisy data, calculation error, algorithm limitation, and so on. To reveal gradient information beyond gradient descent, we introduce a framework (\textbf{GCGD}) to perform gradient correction. GCGD consists of two plug-in modules: 1) inspired by the idea of gradient prediction, we propose a \textbf{GC-W} module for weight gradient correction; 2) based on Neural ODE, we propose a \textbf{GC-ODE} module for hidden states gradient correction. Experiment results show that our gradient correction framework can effectively improve the gradient quality to reduce training epochs by $\sim$ 20\% and also improve the network performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题