论文标题

种子观点:对比表示学习的等级语义对齐

Seed the Views: Hierarchical Semantic Alignment for Contrastive Representation Learning

论文作者

Xu, Haohang, Zhang, Xiaopeng, Li, Hao, Xie, Lingxi, Xiong, Hongkai, Tian, Qi

论文摘要

基于实例歧视的自我监督学习表现出了显着的进步。特别是,对比度学习,将每个图像及其增强视为单个班级,并试图将它们与所有其他图像区分开,并已被验证为有效的表示学习。但是,将两个事实上相似的图像推开是一般表示的次优。在本文中,我们通过将单个图像产生的视图扩展到\ textbf {cross-splams and Multievel}表示,并以层次结构的方式将不变性建模到语义上相似的图像,从而提出了层次结构的语义对齐策​​略。这是通过扩展对比度损失以允许每个锚点的多个积极因素来实现的,并在网络的不同层上明确将语义上相似的图像/斑块一起拉在一起。我们的方法(称为CSML)具有以强大的方式整合样本中多层视觉表示的能力。 CSML适用于当前基于学习的方法,并始终如一地提高性能。值得注意的是,使用MOCO作为实例化,CSML实现了A \ textBf {76.6 \%} top-1的top-1精度,并使用resnet-50作为骨架进行线性评估,\ textbf {66.7 \%} and \ textbf {75.1 \%} top-1 top-1 and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and。 \ textbf {所有这些数字设置了新的最先进的。}

Self-supervised learning based on instance discrimination has shown remarkable progress. In particular, contrastive learning, which regards each image as well as its augmentations as an individual class and tries to distinguish them from all other images, has been verified effective for representation learning. However, pushing away two images that are de facto similar is suboptimal for general representation. In this paper, we propose a hierarchical semantic alignment strategy via expanding the views generated by a single image to \textbf{Cross-samples and Multi-level} representation, and models the invariance to semantically similar images in a hierarchical way. This is achieved by extending the contrastive loss to allow for multiple positives per anchor, and explicitly pulling semantically similar images/patches together at different layers of the network. Our method, termed as CsMl, has the ability to integrate multi-level visual representations across samples in a robust way. CsMl is applicable to current contrastive learning based methods and consistently improves the performance. Notably, using the moco as an instantiation, CsMl achieves a \textbf{76.6\% }top-1 accuracy with linear evaluation using ResNet-50 as backbone, and \textbf{66.7\%} and \textbf{75.1\%} top-1 accuracy with only 1\% and 10\% labels, respectively. \textbf{All these numbers set the new state-of-the-art.}

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源