深度残留网络中施加伽玛正规化的指南

论文标题

深度残留网络中施加伽玛正规化的指南

Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks

论文作者

Kim, Bum Jun, Choi, Hyeyeon, Jang, Hyeonah, Lee, Dong Gu, Jeong, Wonseok, Kim, Sang Woo

论文摘要

L2在神经网络中的重量正则化被广泛用作标准训练技巧。但是，伽玛（Gamma）的L2正则化是批准归一化的可训练参数，仍然是一个未被发现的谜团，并根据图书馆和从业者的不同方式应用。在本文中，我们研究了伽马的L2正则化是否有效。为了探讨这个问题，我们考虑了两种方法：1）方差控制以使残留网络的行为像身份映射一样，以及2）通过提高有效学习率的稳定优化。通过两次分析，我们指定了应用L2正则化的理想且不希望的伽玛，并提出了四个用于管理它们的准则。在几个实验中，我们观察到通过将L2正则化适用于四个类别的伽玛引起的性能的增加和下降，这与我们的四个准则一致。我们提出的指南通过各种任务和架构进行了验证，包括残留网络和变形金刚的变体。

L2 regularization for weights in neural networks is widely used as a standard training trick. However, L2 regularization for gamma, a trainable parameter of batch normalization, remains an undiscussed mystery and is applied in different ways depending on the library and practitioner. In this paper, we study whether L2 regularization for gamma is valid. To explore this issue, we consider two approaches: 1) variance control to make the residual network behave like identity mapping and 2) stable optimization through the improvement of effective learning rate. Through two analyses, we specify the desirable and undesirable gamma to apply L2 regularization and propose four guidelines for managing them. In several experiments, we observed the increase and decrease in performance caused by applying L2 regularization to gamma of four categories, which is consistent with our four guidelines. Our proposed guidelines were validated through various tasks and architectures, including variants of residual networks and transformers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题