论文标题

通过测试时间班级条件特征对齐,在不从头划痕的情况下稳健的视觉变压器

Robustifying Vision Transformer without Retraining from Scratch by Test-Time Class-Conditional Feature Alignment

论文作者

Kojima, Takeshi, Matsuo, Yutaka, Iwasawa, Yusuke

论文摘要

Vision Transformer(VIT)在图像处理中变得越来越流行。具体而言,我们研究了测试时间适应性(TTA)对VIT的有效性,VIT是该技术已出现的,该技术已在测试时间本身纠正其预测。首先,我们在VIT-B16和VIT-L16上基准了各种测试时间适应方法。结果表明,在使用适当的损耗函数时,TTA对VIT有效,并且先前的定型(明智地选择调制参数)是不需要的。基于观察结果,我们提出了一种称为“类条件特征对齐(CFA)”的新测试时间适应方法,该方法以在线方式和在线方式之间隐藏表示形式之间的阶级条件分布差和整体分布差异最小化。图像分类任务的实验(CIFAR-10-C,CIFAR-100-C和IMAGENET-C)和域改编(Digits DataSet和Imagenet-Sketch)表明,CFA稳定地超过了各种数据集中的现有基础。我们还通过在Resnet,MLP-Mixer和几种VIT变体(Vit-augreg,Deit和Beit)上实验来验证CFA是模型不可知论。使用BEIT主链,CFA在Imagenet-C上达到了19.8%的TOP-1错误率,表现优于现有的测试时间适应基线为44.0%。这是不需要改变训练阶段的TTA方法中的最新结果。

Vision Transformer (ViT) is becoming more popular in image processing. Specifically, we investigate the effectiveness of test-time adaptation (TTA) on ViT, a technique that has emerged to correct its prediction during test-time by itself. First, we benchmark various test-time adaptation approaches on ViT-B16 and ViT-L16. It is shown that the TTA is effective on ViT and the prior-convention (sensibly selecting modulation parameters) is not necessary when using proper loss function. Based on the observation, we propose a new test-time adaptation method called class-conditional feature alignment (CFA), which minimizes both the class-conditional distribution differences and the whole distribution differences of the hidden representation between the source and target in an online manner. Experiments of image classification tasks on common corruption (CIFAR-10-C, CIFAR-100-C, and ImageNet-C) and domain adaptation (digits datasets and ImageNet-Sketch) show that CFA stably outperforms the existing baselines on various datasets. We also verify that CFA is model agnostic by experimenting on ResNet, MLP-Mixer, and several ViT variants (ViT-AugReg, DeiT, and BeiT). Using BeiT backbone, CFA achieves 19.8% top-1 error rate on ImageNet-C, outperforming the existing test-time adaptation baseline 44.0%. This is a state-of-the-art result among TTA methods that do not need to alter training phase.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源