通过抑制自适应步骤范围来改善适应性的DNN优化器

论文标题

通过抑制自适应步骤范围来改善适应性的DNN优化器

A DNN Optimizer that Improves over AdaBelief by Suppression of the Adaptive Stepsize Range

论文作者

Zhang, Guoqiang, Niwa, Kenta, Kleijn, W. Bastiaan

论文摘要

我们为改善自适应射击器的性能做出了贡献。我们的改进是基于抑制Adabelief优化器中自适应步骤范围的范围。首先，我们表明，参数epsilon在Andabelief的更新表达式中的特定位置减少了自适应步骤的范围，使AnbeLief以动量更接近SGD。其次，我们通过进一步抑制自适应步骤的范围来扩展适应性。为了实现上述目标，我们在使用梯度G_T及其第一个动量M_T之间执行相互层的向量投影，然后再使用它们来估计第二动量。新的优化方法称为AIDA。第三，广泛的实验结果表明，在为NLP训练变压器和LSTM时，AIDA的表现优于九个优化器，而在CIAF10和CIFAR100上进行图像分类的VGG和ResNet在培训WGAN-GP模型的九种方法的最佳性能中，以实现图像生成任务的最佳性能。此外，AIDA的验证精度比对于ImageNet上的训练RESNET18的验证精度更高。代码可用<a href =“ https://github.com/guoqiang-x-zhang/aidaoptimizer”>在此url </a>

We make contributions towards improving adaptive-optimizer performance. Our improvements are based on suppression of the range of adaptive stepsizes in the AdaBelief optimizer. Firstly, we show that the particular placement of the parameter epsilon within the update expressions of AdaBelief reduces the range of the adaptive stepsizes, making AdaBelief closer to SGD with momentum. Secondly, we extend AdaBelief by further suppressing the range of the adaptive stepsizes. To achieve the above goal, we perform mutual layerwise vector projections between the gradient g_t and its first momentum m_t before using them to estimate the second momentum. The new optimization method is referred to as Aida. Thirdly, extensive experimental results show that Aida outperforms nine optimizers when training transformers and LSTMs for NLP, and VGG and ResNet for image classification over CIAF10 and CIFAR100 while matching the best performance of the nine methods when training WGAN-GP models for image generation tasks. Furthermore, Aida produces higher validation accuracies than AdaBelief for training ResNet18 over ImageNet. Code is available <a href="https://github.com/guoqiang-x-zhang/AidaOptimizer">at this URL</a>

下载PDF全文

下载文献需遵守相关版权规定

论文标题