扩展批归一化

论文标题

扩展批归一化

Extended Batch Normalization

论文作者

Luo, Chunjie, Zhan, Jianfeng, Wang, Lei, Gao, Wanling

论文摘要

分批归一化（BN）已成为培训现代深网的标准技术。但是，由于批处理统计估计变得不准确，因此当批处理大小变小时时，其有效性会降低。这阻碍了批处理标准化的使用1）训练较大的模型，该模型需要受内存消耗限制的小批量，2）在移动设备或嵌入式设备上进行训练，其内存资源受到限制。在本文中，我们提出了一种简单但有效的方法，称为扩展批准归一化（EBN）。对于NCHW格式特征映射，扩展批准归一化计算沿（n，h，w）维度的平均值与批处理归一化相同，以保持批处理归一化的优势。为了减轻小批量大小引起的问题，扩展批归归式化计算沿（N，C，H，W）维度的标准偏差，从而扩大计算标准偏差的样品数量。我们分别将扩展的批归归归归归量积性与分别在MNIST，CIFAR-10/100，STL-10和Imagenet的数据集上进行了比较。实验表明，扩展批归归式化可以减轻批量生产量较小的批次归一化问题，同时实现近距离规范化的近距离标准化。

Batch normalization (BN) has become a standard technique for training the modern deep networks. However, its effectiveness diminishes when the batch size becomes smaller, since the batch statistics estimation becomes inaccurate. That hinders batch normalization's usage for 1) training larger model which requires small batches constrained by memory consumption, 2) training on mobile or embedded devices of which the memory resource is limited. In this paper, we propose a simple but effective method, called extended batch normalization (EBN). For NCHW format feature maps, extended batch normalization computes the mean along the (N, H, W) dimensions, as the same as batch normalization, to maintain the advantage of batch normalization. To alleviate the problem caused by small batch size, extended batch normalization computes the standard deviation along the (N, C, H, W) dimensions, thus enlarges the number of samples from which the standard deviation is computed. We compare extended batch normalization with batch normalization and group normalization on the datasets of MNIST, CIFAR-10/100, STL-10, and ImageNet, respectively. The experiments show that extended batch normalization alleviates the problem of batch normalization with small batch size while achieving close performances to batch normalization with large batch size.

下载PDF全文

下载文献需遵守相关版权规定

论文标题