CO2：无监督的视觉表示学习的一致对比度学习

论文标题

CO2：无监督的视觉表示学习的一致对比度学习

CO2: Consistent Contrast for Unsupervised Visual Representation Learning

论文作者

Wei, Chen, Wang, Huiyu, Shen, Wei, Yuille, Alan

论文摘要

对比学习已被用作无监督的视觉表示学习的核心方法。如果没有人类注释，常见的做法是执行实例歧视任务：鉴于查询图像作物，此任务将与阳性相同图像的作物标记，以及来自其他随机采样图像的农作物作为负面的作物。该标签分配策略的一个重要局限性是，它无法反映出查询作物与其他图像中的每种作物之间的异质相似性，使它们同样为负，而其中一些甚至可能属于与查询相同的语义类别。为了解决这个问题，我们在未标记数据的半监督学习中的一致性正规化启发，我们提出了一致的对比度（CO2），该对比度（CO2）将一致性正则化项引入了当前的对比度学习框架。关于查询作物与每种作物的相似性从其他图像“未标记”中，一致性术语以阳性作物为伪标签的相应相似性，并鼓励这两个相似性之间的一致性。从经验上讲，二氧化碳在ImageNet线性方案上提高了2.9％的TOP-1精度，在1％和10％标记的半固定设置上，将动量对比度（MOCO）提高了2.9％。它还转移到Pascal VOC的图像分类，对象检测和语义分割。这表明二氧化碳可以学习这些下游任务的更好的视觉表示。

Contrastive learning has been adopted as a core method for unsupervised visual representation learning. Without human annotation, the common practice is to perform an instance discrimination task: Given a query image crop, this task labels crops from the same image as positives, and crops from other randomly sampled images as negatives. An important limitation of this label assignment strategy is that it can not reflect the heterogeneous similarity between the query crop and each crop from other images, taking them as equally negative, while some of them may even belong to the same semantic class as the query. To address this issue, inspired by consistency regularization in semi-supervised learning on unlabeled data, we propose Consistent Contrast (CO2), which introduces a consistency regularization term into the current contrastive learning framework. Regarding the similarity of the query crop to each crop from other images as "unlabeled", the consistency term takes the corresponding similarity of a positive crop as a pseudo label, and encourages consistency between these two similarities. Empirically, CO2 improves Momentum Contrast (MoCo) by 2.9% top-1 accuracy on ImageNet linear protocol, 3.8% and 1.1% top-5 accuracy on 1% and 10% labeled semi-supervised settings. It also transfers to image classification, object detection, and semantic segmentation on PASCAL VOC. This shows that CO2 learns better visual representations for these downstream tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题