深度学习中规范化方法的新解释

论文标题

深度学习中规范化方法的新解释

New Interpretations of Normalization Methods in Deep Learning

论文作者

Sun, Jiacheng, Cao, Xiangyong, Liang, Hanwen, Huang, Weiran, Chen, Zewei, Li, Zhenguo

论文摘要

近年来，已经提出了多种归一化方法来帮助培训神经网络，例如批处理（BN），层归一化（LN），体重归一化（WN），组归一化（GN）等。但是，缺乏分析所有这些归一化方法的数学工具。在本文中，我们首先提出了一个引理来定义一些必要的工具。然后，我们使用这些工具对流行的归一化方法进行了深入分析，并获得以下结论：1）大多数归一化方法可以在统一的框架中解释，即将预激活或权重标准化到球体上； 2）由于大多数现有的归一化方法是扩展不变的，因此我们可以在删除缩放对称性的球体上进行优化，这可以帮助稳定网络的训练； 3）我们证明，使用这些归一化方法的培训可以使权重标准增加，这可能会导致对抗性脆弱性，因为它会放大攻击。最后，进行了一系列实验以验证这些主张。

In recent years, a variety of normalization methods have been proposed to help train neural networks, such as batch normalization (BN), layer normalization (LN), weight normalization (WN), group normalization (GN), etc. However, mathematical tools to analyze all these normalization methods are lacking. In this paper, we first propose a lemma to define some necessary tools. Then, we use these tools to make a deep analysis on popular normalization methods and obtain the following conclusions: 1) Most of the normalization methods can be interpreted in a unified framework, namely normalizing pre-activations or weights onto a sphere; 2) Since most of the existing normalization methods are scaling invariant, we can conduct optimization on a sphere with scaling symmetry removed, which can help stabilize the training of network; 3) We prove that training with these normalization methods can make the norm of weights increase, which could cause adversarial vulnerability as it amplifies the attack. Finally, a series of experiments are conducted to verify these claims.

下载PDF全文

下载文献需遵守相关版权规定

论文标题