超级计算机量表的训练效率网络：一小时内的83％Imagenet Top-1精度

论文标题

超级计算机量表的训练效率网络：一小时内的83％Imagenet Top-1精度

Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour

论文作者

Wongpanich, Arissa, Pham, Hieu, Demmel, James, Tan, Mingxing, Le, Quoc, You, Yang, Kumar, Sameer

论文摘要

有效网络是基于有效缩放卷积神经网络的最先进的图像分类模型家族。目前，有效网络可以按照几天的训练命令；例如，训练有效网络B0模型在云TPU V2-8节点上需要23小时。在本文中，我们探讨了以2048核心在TPU-V3 POD上缩小培训的技术，这是由在此类尺度上进行训练时可以实现的速度的动机。我们讨论了1024 TPU-V3内核的批量训练所需的优化，例如选择大批量优化器和学习率计划，以及利用分布式评估和批准技术。此外，我们为在Imagenet数据集上训练的有效网络模型提供了时间和性能基准，以分析有效网络的行为。通过我们的优化，我们能够在ImageNet上的ExtricNet训练1小时4分钟的精度为83％。

EfficientNets are a family of state-of-the-art image classification models based on efficiently scaled convolutional neural networks. Currently, EfficientNets can take on the order of days to train; for example, training an EfficientNet-B0 model takes 23 hours on a Cloud TPU v2-8 node. In this paper, we explore techniques to scale up the training of EfficientNets on TPU-v3 Pods with 2048 cores, motivated by speedups that can be achieved when training at such scales. We discuss optimizations required to scale training to a batch size of 65536 on 1024 TPU-v3 cores, such as selecting large batch optimizers and learning rate schedules as well as utilizing distributed evaluation and batch normalization techniques. Additionally, we present timing and performance benchmarks for EfficientNet models trained on the ImageNet dataset in order to analyze the behavior of EfficientNets at scale. With our optimizations, we are able to train EfficientNet on ImageNet to an accuracy of 83% in 1 hour and 4 minutes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题