用于标签效率学习的掩盖暹罗网络

论文标题

用于标签效率学习的掩盖暹罗网络

Masked Siamese Networks for Label-Efficient Learning

论文作者

Assran, Mahmoud, Caron, Mathilde, Misra, Ishan, Bojanowski, Piotr, Bordes, Florian, Vincent, Pascal, Joulin, Armand, Rabbat, Michael, Ballas, Nicolas

论文摘要

我们提出了蒙面的暹罗网络（MSN），这是一个用于学习图像表示的自我监督的学习框架。我们的方法与包含随机掩盖贴片的图像视图的表示与原始未掩盖图像的表示形式相匹配。当应用于视觉变压器时，这种自我监督的预训练策略特别可扩展，因为仅网络处理未掩盖的贴片。结果，MSN提高了联合构造体系结构的可扩展性，同时产生高语义水平的表示，这些表示在低射击图像分类方面具有竞争力。例如，在ImagEnet-1k上，只有5,000张带注释的图像，我们的基本MSN模型可实现72.4％的TOP-1精度，并且具有1％的Imagenet-1K标签，我们实现了75.7％的TOP-1准确性，为在这个基准标记上提供了新的自我治疗的最先进。我们的代码公开可用。

We propose Masked Siamese Networks (MSN), a self-supervised learning framework for learning image representations. Our approach matches the representation of an image view containing randomly masked patches to the representation of the original unmasked image. This self-supervised pre-training strategy is particularly scalable when applied to Vision Transformers since only the unmasked patches are processed by the network. As a result, MSNs improve the scalability of joint-embedding architectures, while producing representations of a high semantic level that perform competitively on low-shot image classification. For instance, on ImageNet-1K, with only 5,000 annotated images, our base MSN model achieves 72.4% top-1 accuracy, and with 1% of ImageNet-1K labels, we achieve 75.7% top-1 accuracy, setting a new state-of-the-art for self-supervised learning on this benchmark. Our code is publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题