论文标题
基于梯度平均的梯度下降算法的变体
A Variant of Gradient Descent Algorithm Based on Gradient Averaging
论文作者
论文摘要
在这项工作中,我们研究了优化器,Grad-AVG以优化错误函数。我们在数学上与最小化器(在有限的假设下)建立了毕业生迭代序列的融合。我们将Grad-Avg以及一些流行的优化器应用于回归和分类任务。在回归任务中,观察到Grad-AVG的行为几乎与随机梯度下降(SGD)几乎相同。我们为这一事实提供了数学上的理由。在分类任务的情况下,可以通过适当缩放参数来增强Grad-AVG的性能。实验结果表明,在两个基准数据集上的分类任务中,Grad-AVG的收敛速度比其他最先进的优化器更快。
In this work, we study an optimizer, Grad-Avg to optimize error functions. We establish the convergence of the sequence of iterates of Grad-Avg mathematically to a minimizer (under boundedness assumption). We apply Grad-Avg along with some of the popular optimizers on regression as well as classification tasks. In regression tasks, it is observed that the behaviour of Grad-Avg is almost identical with Stochastic Gradient Descent (SGD). We present a mathematical justification of this fact. In case of classification tasks, it is observed that the performance of Grad-Avg can be enhanced by suitably scaling the parameters. Experimental results demonstrate that Grad-Avg converges faster than the other state-of-the-art optimizers for the classification task on two benchmark datasets.