论文标题
通过标准化层学习的球形观点
Spherical Perspective on Learning with Normalization Layers
论文作者
论文摘要
归一化层(NLS)广泛用于现代深度学习体系结构。尽管它们显然很简单,但它们对优化的影响尚未完全理解。本文介绍了一个球形框架,从几何学角度研究了具有NLS的神经网络的优化。具体而言,参数组的径向不变性,例如用于卷积神经网络的过滤器,允许在$ L_2 $单位hypersphere上转换优化步骤。这种公式和相关的几何解释为训练动力学提供了新的启示。首先,得出了亚当的第一个有效学习率表达。然后,在存在NLS的情况下,仅执行随机梯度下降(SGD)实际上等于ADAM的变体,该变体依靠框架。最后,该分析概述了Adam先前的变体作用及其在优化过程中的重要性的现象得到了实验验证。
Normalization Layers (NLs) are widely used in modern deep-learning architectures. Despite their apparent simplicity, their effect on optimization is not yet fully understood. This paper introduces a spherical framework to study the optimization of neural networks with NLs from a geometric perspective. Concretely, the radial invariance of groups of parameters, such as filters for convolutional neural networks, allows to translate the optimization steps on the $L_2$ unit hypersphere. This formulation and the associated geometric interpretation shed new light on the training dynamics. Firstly, the first effective learning rate expression of Adam is derived. Then the demonstration that, in the presence of NLs, performing Stochastic Gradient Descent (SGD) alone is actually equivalent to a variant of Adam constrained to the unit hypersphere, stems from the framework. Finally, this analysis outlines phenomena that previous variants of Adam act on and their importance in the optimization process are experimentally validated.