Negrad：近乎理想的梯度下降

论文标题

Negrad：近乎理想的梯度下降

Neograd: Near-Ideal Gradient Descent

论文作者

Zimmer, Michael F.

论文摘要

本文的目的是通过解决两个问题来改善梯度下降的现有变体：（1）在最小化成本函数时删除（或减少）出现的高原，（2）将学习率不断调整为“理想”值。所采用的方法是大致求解学习率，这是信托指标的函数。当该技术与动量杂交时，它会产生一个特别有效的梯度下降变体，称为Neogradm。显示出在几个测试问题上的表现都胜过Adam，例如，可以轻松地达到较小的成本函数值，例如$ 10^8 $。

The purpose of this paper is to improve upon existing variants of gradient descent by solving two problems: (1) removing (or reducing) the plateau that occurs while minimizing the cost function, (2) continually adjusting the learning rate to an "ideal" value. The approach taken is to approximately solve for the learning rate as a function of a trust metric. When this technique is hybridized with momentum, it creates an especially effective gradient descent variant, called NeogradM. It is shown to outperform Adam on several test problems, and can easily reach cost function values that are smaller by a factor of $10^8$, for example.

下载PDF全文

下载文献需遵守相关版权规定

论文标题