论文标题
通过补丁采样时间表加速视觉变压器训练
Accelerating Vision Transformer Training via a Patch Sampling Schedule
论文作者
论文摘要
我们介绍了贴片采样时间表(PSS)的概念,该概念会在训练过程中每批使用的视觉变压器(VIT)贴片的数量。由于所有补丁对于大多数视觉目标(例如分类)并不重要,因此我们认为,不太重要的补丁可以用于较少的训练迭代中,从而导致较短的训练时间,对性能的影响最小。此外,我们观察到,使用PSS的训练可以使VIT在推理过程中更加稳健地对更广泛的贴片采样范围。这允许在推理过程中吞吐量和准确性之间进行细粒度,动态的权衡。我们在VIT上使用PSSS进行ImageNet评估经过训练的从头开始训练和使用重建损耗函数进行了预训练。对于预训练的模型,与使用所有斑块相比,我们的分类准确性降低了0.26%(从25小时到17小时)降低了0.26%。代码,模型检查点和日志可在https://github.com/bradmcdanel/pss上找到。
We introduce the notion of a Patch Sampling Schedule (PSS), that varies the number of Vision Transformer (ViT) patches used per batch during training. Since all patches are not equally important for most vision objectives (e.g., classification), we argue that less important patches can be used in fewer training iterations, leading to shorter training time with minimal impact on performance. Additionally, we observe that training with a PSS makes a ViT more robust to a wider patch sampling range during inference. This allows for a fine-grained, dynamic trade-off between throughput and accuracy during inference. We evaluate using PSSs on ViTs for ImageNet both trained from scratch and pre-trained using a reconstruction loss function. For the pre-trained model, we achieve a 0.26% reduction in classification accuracy for a 31% reduction in training time (from 25 to 17 hours) compared to using all patches each iteration. Code, model checkpoints and logs are available at https://github.com/BradMcDanel/pss.