批处理组的归一化

论文标题

批处理组的归一化

Batch Group Normalization

论文作者

Zhou, Xiao-Yun, Sun, Jiacheng, Ye, Nanyang, Lan, Xu, Luo, Qijun, Lai, Bo-Lin, Esperanca, Pedro, Yang, Guang-Zhong, Li, Zhenguo

论文摘要

深度卷积神经网络（DCNN）很难训练。归一化是有效的解决方案之一。在先前的归一化方法中，批归归量表（BN）在中等大小的大小下表现良好，并且对多个视力任务具有良好的概括性，而其性能在小批量尺寸下显着降低。在本文中，我们发现BN在极端大批量的大小上饱和，即每个工人的128张图像，即GPU，并提出BN在小/极端大批量上的降解/饱和度是由嘈杂/混乱的统计计算引起的。因此，提出了不使用多层或多材料信息添加新的可训练参数，或者提出额外的计算，就提出了批处理组的归一化（BGN）来解决小/极端大批量大小的嘈杂/混乱的统计计算，并通过引入通道，高度和宽度尺寸，以补偿。使用小组归一化（GN）中的组技术，并使用超参数G来控制用于统计计算的特征实例的数量，因此既不提供不同批次大小的嘈杂或混乱的统计量。 We empirically demonstrate that BGN consistently outperforms BN, Instance Normalization (IN), Layer Normalization (LN), GN, and Positional Normalization (PN), across a wide spectrum of vision tasks, including image classification, Neural Architecture Search (NAS), adversarial learning, Few Shot Learning (FSL) and Unsupervised Domain Adaptation (UDA), indicating its good performance, robust stability to batch size and广泛的概括性。例如，对于批次大小为2的ImageNet上的训练Resnet-50，BN的TOP1精度为66.512％，而BGN则达到76.096％，并有了显着改善。

Deep Convolutional Neural Networks (DCNNs) are hard and time-consuming to train. Normalization is one of the effective solutions. Among previous normalization methods, Batch Normalization (BN) performs well at medium and large batch sizes and is with good generalizability to multiple vision tasks, while its performance degrades significantly at small batch sizes. In this paper, we find that BN saturates at extreme large batch sizes, i.e., 128 images per worker, i.e., GPU, as well and propose that the degradation/saturation of BN at small/extreme large batch sizes is caused by noisy/confused statistic calculation. Hence without adding new trainable parameters, using multiple-layer or multi-iteration information, or introducing extra computation, Batch Group Normalization (BGN) is proposed to solve the noisy/confused statistic calculation of BN at small/extreme large batch sizes with introducing the channel, height and width dimension to compensate. The group technique in Group Normalization (GN) is used and a hyper-parameter G is used to control the number of feature instances used for statistic calculation, hence to offer neither noisy nor confused statistic for different batch sizes. We empirically demonstrate that BGN consistently outperforms BN, Instance Normalization (IN), Layer Normalization (LN), GN, and Positional Normalization (PN), across a wide spectrum of vision tasks, including image classification, Neural Architecture Search (NAS), adversarial learning, Few Shot Learning (FSL) and Unsupervised Domain Adaptation (UDA), indicating its good performance, robust stability to batch size and wide generalizability. For example, for training ResNet-50 on ImageNet with a batch size of 2, BN achieves Top1 accuracy of 66.512% while BGN achieves 76.096% with notable improvement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题