论文标题
TokenMixup:有效的注意力引导令牌级数据增强变压器
TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers
论文作者
论文摘要
混音是用于图像分类的通常采用的数据增强技术。混合方法的最新进展主要集中在基于显着性的混合上。但是,许多显着性检测器需要强烈的计算,并且对于参数较重的变压器模型特别繁重。为此,我们提出了TokenMixup,这是一种有效的注意力引导的令牌数据增强方法,旨在最大程度地提高混合代币集的显着性。与基于梯度的方法相比,TokenMixup提供了X15更快的显着性数据增强。此外,我们介绍了一个TokenMixup的一种变体,该变体将令牌在一个实例中混合,从而实现多尺度功能增强。实验表明,我们的方法显着提高了基线模型在CIFAR和Imagenet-1k上的性能,同时比以前的方法更有效。我们还可以在从Scratch Transformer模型中达到CIFAR-100的最先进性能。代码可在https://github.com/mlvlab/tokenmixup上找到。
Mixup is a commonly adopted data augmentation technique for image classification. Recent advances in mixup methods primarily focus on mixing based on saliency. However, many saliency detectors require intense computation and are especially burdensome for parameter-heavy transformer models. To this end, we propose TokenMixup, an efficient attention-guided token-level data augmentation method that aims to maximize the saliency of a mixed set of tokens. TokenMixup provides x15 faster saliency-aware data augmentation compared to gradient-based methods. Moreover, we introduce a variant of TokenMixup which mixes tokens within a single instance, thereby enabling multi-scale feature augmentation. Experiments show that our methods significantly improve the baseline models' performance on CIFAR and ImageNet-1K, while being more efficient than previous methods. We also reach state-of-the-art performance on CIFAR-100 among from-scratch transformer models. Code is available at https://github.com/mlvlab/TokenMixup.