论文标题

有效技术:探索用于培训视觉主机的广义课程学习

EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones

论文作者

Wang, Yulin, Yue, Yang, Lu, Rui, Liu, Tianjiao, Zhong, Zhao, Song, Shiji, Huang, Gao

论文摘要

现代深层网络的出色表现通常具有昂贵的培训程序。本文提出了一种新的课程学习方法,用于有效训练视觉骨干(例如,视觉变压器)。我们的工作灵感来自深网的固有学习动力:我们在实验上表明,在较早的培训阶段,该模型主要学会识别每个示例中的一些“易于学习”的判别模式,例如,图像的低频组件,图像的低频组件以及数据扩大前的原始信息。在这种现象的驱动下,我们提出了一个课程,该课程始终在每个时期都利用所有训练数据,而课程仅从揭露每个示例的“易于学习”模式,并逐渐引入更困难的模式。为了实现这一想法,我们1)在输入的傅立叶范围内引入了一次裁剪操作,这使模型只能从低频式的组件中学习,2)证明,公开原始图像的特征等于采用较弱的数据增强功能,并且3)整合1)和2)和2)和2)和与Greedys Algorith Algorith Algorith Algorith一起设计。由此产生的方法有效地,简单,一般但令人惊讶地有效。作为一种现成的方法,它将各种流行模型(例如Resnet,Convnext,Deit,Deit,Pvt,Swin和Cswin)的壁时间训练成本降低了> 1.5倍,而Imagenet-1K/22K则无需牺牲准确性。它对自学学习也有效(例如,MAE)。代码可从https://github.com/leaplabthu/felficitytrain获得。

The superior performance of modern deep networks usually comes with a costly training procedure. This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers). Our work is inspired by the inherent learning dynamics of deep networks: we experimentally show that at an earlier training stage, the model mainly learns to recognize some 'easier-to-learn' discriminative patterns within each example, e.g., the lower-frequency components of images and the original information before data augmentation. Driven by this phenomenon, we propose a curriculum where the model always leverages all the training data at each epoch, while the curriculum starts with only exposing the 'easier-to-learn' patterns of each example, and introduces gradually more difficult patterns. To implement this idea, we 1) introduce a cropping operation in the Fourier spectrum of the inputs, which enables the model to learn from only the lower-frequency components efficiently, 2) demonstrate that exposing the features of original images amounts to adopting weaker data augmentation, and 3) integrate 1) and 2) and design a curriculum learning schedule with a greedy-search algorithm. The resulting approach, EfficientTrain, is simple, general, yet surprisingly effective. As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models (e.g., ResNet, ConvNeXt, DeiT, PVT, Swin, and CSWin) by >1.5x on ImageNet-1K/22K without sacrificing accuracy. It is also effective for self-supervised learning (e.g., MAE). Code is available at https://github.com/LeapLabTHU/EfficientTrain.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源