基于梯度平均的梯度下降算法的变体

论文标题

基于梯度平均的梯度下降算法的变体

A Variant of Gradient Descent Algorithm Based on Gradient Averaging

论文作者

Purkayastha, Saugata, Purkayastha, Sukannya

论文摘要

在这项工作中，我们研究了优化器，Grad-AVG以优化错误函数。我们在数学上与最小化器（在有限的假设下）建立了毕业生迭代序列的融合。我们将Grad-Avg以及一些流行的优化器应用于回归和分类任务。在回归任务中，观察到Grad-AVG的行为几乎与随机梯度下降（SGD）几乎相同。我们为这一事实提供了数学上的理由。在分类任务的情况下，可以通过适当缩放参数来增强Grad-AVG的性能。实验结果表明，在两个基准数据集上的分类任务中，Grad-AVG的收敛速度比其他最先进的优化器更快。

In this work, we study an optimizer, Grad-Avg to optimize error functions. We establish the convergence of the sequence of iterates of Grad-Avg mathematically to a minimizer (under boundedness assumption). We apply Grad-Avg along with some of the popular optimizers on regression as well as classification tasks. In regression tasks, it is observed that the behaviour of Grad-Avg is almost identical with Stochastic Gradient Descent (SGD). We present a mathematical justification of this fact. In case of classification tasks, it is observed that the performance of Grad-Avg can be enhanced by suitably scaling the parameters. Experimental results demonstrate that Grad-Avg converges faster than the other state-of-the-art optimizers for the classification task on two benchmark datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题