平面最小优化器何时起作用？

论文标题

平面最小优化器何时起作用？

When Do Flat Minima Optimizers Work?

论文作者

Kaddour, Jean, Liu, Linqing, Silva, Ricardo, Kusner, Matt J.

论文摘要

最近，寻求在低损失社区中找到参数的平坦米尼马优化器已被证明可以改善神经网络对基于随机和适应性梯度的优化器的概括性能。两种方法由于其可伸缩性而受到了极大的关注：1。随机重量平均（SWA）和2。清晰度感知的最小化（SAM）。但是，对其性质的研究有限，并且没有对不同领域的系统基准测试。我们通过比较与每种方法训练的模型的损失表面，以及通过计算机视觉，自然语言处理和图表表示任务进行广泛的基准测试来填补这一空白。我们从这些结果中发现了一些令人惊讶的发现，我们希望这将帮助研究人员进一步改善深度学习优化者，从业人员为他们的问题确定了正确的优化器。

Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods, have been shown to improve a neural network's generalization performance over stochastic and adaptive gradient-based optimizers. Two methods have received significant attention due to their scalability: 1. Stochastic Weight Averaging (SWA), and 2. Sharpness-Aware Minimization (SAM). However, there has been limited investigation into their properties and no systematic benchmarking of them across different domains. We fill this gap here by comparing the loss surfaces of the models trained with each method and through broad benchmarking across computer vision, natural language processing, and graph representation learning tasks. We discover several surprising findings from these results, which we hope will help researchers further improve deep learning optimizers, and practitioners identify the right optimizer for their problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题