Supmae：受监督的蒙面自动编码器是有效的视力学习者

论文标题

Supmae：受监督的蒙面自动编码器是有效的视力学习者

SupMAE: Supervised Masked Autoencoders Are Efficient Vision Learners

论文作者

Liang, Feng, Li, Yangguang, Marculescu, Diana

论文摘要

最近，自我监督的蒙面自动编码器（MAE）因其令人印象深刻的表示能力而引起了前所未有的关注。但是，借口任务是掩盖的图像建模（MIM），重建了缺少的本地补丁，缺乏对图像的全局理解。本文通过添加一个有监督的分类部门将MAE扩展到完全监督的环境中，从而使Mae能够有效地从黄金标签中学习全球功能。所提出的监督MAE（Supmae）仅利用图像贴片的可见子集进行分类，这与使用所有图像贴片的标准监督预训练不同。通过实验，我们证明了Supmae不仅更有效地训练，而且还学会了更强大和可转移的功能。具体而言，Supmae在使用VIT-B/16模型的ImageNet上评估时仅使用30％的计算来实现可比较的性能。 Supmae对ImageNet变体的鲁棒性和转移学习绩效优于MAE和标准监督前培训的对应物。代码可在https://github.com/enyacgroup/supmae上找到。

Recently, self-supervised Masked Autoencoders (MAE) have attracted unprecedented attention for their impressive representation learning ability. However, the pretext task, Masked Image Modeling (MIM), reconstructs the missing local patches, lacking the global understanding of the image. This paper extends MAE to a fully supervised setting by adding a supervised classification branch, thereby enabling MAE to learn global features from golden labels effectively. The proposed Supervised MAE (SupMAE) only exploits a visible subset of image patches for classification, unlike the standard supervised pre-training where all image patches are used. Through experiments, we demonstrate that SupMAE is not only more training efficient but it also learns more robust and transferable features. Specifically, SupMAE achieves comparable performance with MAE using only 30% of compute when evaluated on ImageNet with the ViT-B/16 model. SupMAE's robustness on ImageNet variants and transfer learning performance outperforms MAE and standard supervised pre-training counterparts. Codes are available at https://github.com/enyac-group/supmae.

下载PDF全文

下载文献需遵守相关版权规定

论文标题