论文标题
关于自适应优化方法的算法稳定性和概括
On the Algorithmic Stability and Generalization of Adaptive Optimization Methods
论文作者
论文摘要
尽管它们在一般的深度学习和机器学习方面很受欢迎,但尚未完全了解自适应优化者(例如Adagrad,RMSPROP,ADAM或ADAMW)的理论特性。在本文中,我们开发了一个新颖的框架来研究这些优化方法的稳定性和概括。基于此框架,我们可以证明可证明对这种属性的保证,这些属性很大程度上取决于单个参数$β_2$。我们的经验实验支持我们的主张,并提供对自适应优化方法的稳定性和概括特性的实际见解。
Despite their popularity in deep learning and machine learning in general, the theoretical properties of adaptive optimizers such as Adagrad, RMSProp, Adam or AdamW are not yet fully understood. In this paper, we develop a novel framework to study the stability and generalization of these optimization methods. Based on this framework, we show provable guarantees about such properties that depend heavily on a single parameter $β_2$. Our empirical experiments support our claims and provide practical insights into the stability and generalization properties of adaptive optimization methods.