有效技术：探索用于培训视觉主机的广义课程学习

论文标题

有效技术：探索用于培训视觉主机的广义课程学习

EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones

论文作者

Wang, Yulin, Yue, Yang, Lu, Rui, Liu, Tianjiao, Zhong, Zhao, Song, Shiji, Huang, Gao

论文摘要

现代深层网络的出色表现通常具有昂贵的培训程序。本文提出了一种新的课程学习方法，用于有效训练视觉骨干（例如，视觉变压器）。我们的工作灵感来自深网的固有学习动力：我们在实验上表明，在较早的培训阶段，该模型主要学会识别每个示例中的一些“易于学习”的判别模式，例如，图像的低频组件，图像的低频组件以及数据扩大前的原始信息。在这种现象的驱动下，我们提出了一个课程，该课程始终在每个时期都利用所有训练数据，而课程仅从揭露每个示例的“易于学习”模式，并逐渐引入更困难的模式。为了实现这一想法，我们1）在输入的傅立叶范围内引入了一次裁剪操作，这使模型只能从低频式的组件中学习，2）证明，公开原始图像的特征等于采用较弱的数据增强功能，并且3）整合1）和2）和2）和2）和与Greedys Algorith Algorith Algorith Algorith一起设计。由此产生的方法有效地，简单，一般但令人惊讶地有效。作为一种现成的方法，它将各种流行模型（例如Resnet，Convnext，Deit，Deit，Pvt，Swin和Cswin）的壁时间训练成本降低了> 1.5倍，而Imagenet-1K/22K则无需牺牲准确性。它对自学学习也有效（例如，MAE）。代码可从https://github.com/leaplabthu/felficitytrain获得。

The superior performance of modern deep networks usually comes with a costly training procedure. This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers). Our work is inspired by the inherent learning dynamics of deep networks: we experimentally show that at an earlier training stage, the model mainly learns to recognize some 'easier-to-learn' discriminative patterns within each example, e.g., the lower-frequency components of images and the original information before data augmentation. Driven by this phenomenon, we propose a curriculum where the model always leverages all the training data at each epoch, while the curriculum starts with only exposing the 'easier-to-learn' patterns of each example, and introduces gradually more difficult patterns. To implement this idea, we 1) introduce a cropping operation in the Fourier spectrum of the inputs, which enables the model to learn from only the lower-frequency components efficiently, 2) demonstrate that exposing the features of original images amounts to adopting weaker data augmentation, and 3) integrate 1) and 2) and design a curriculum learning schedule with a greedy-search algorithm. The resulting approach, EfficientTrain, is simple, general, yet surprisingly effective. As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models (e.g., ResNet, ConvNeXt, DeiT, PVT, Swin, and CSWin) by >1.5x on ImageNet-1K/22K without sacrificing accuracy. It is also effective for self-supervised learning (e.g., MAE). Code is available at https://github.com/LeapLabTHU/EfficientTrain.

下载PDF全文

下载文献需遵守相关版权规定

论文标题