obow：在线视袋文字，用于自我监督的学习

论文标题

obow：在线视袋文字，用于自我监督的学习

OBoW: Online Bag-of-Visual-Words Generation for Self-Supervised Learning

论文作者

Gidaris, Spyros, Bursuc, Andrei, Puy, Gilles, Komodakis, Nikos, Cord, Matthieu, Pérez, Patrick

论文摘要

在没有人类监督的情况下学习图像表示是一个重要而活跃的研究领域。最近的几种方法成功地利用了在不同类型的扰动下使这种表示不变的想法，尤其是通过基于对比的实例歧视培训。尽管有效的视觉表示确实应该表现出这样的不变，但还有其他重要特征，例如编码上下文推理技能，以此为基于重建的方法可能更适合这种特征。考虑到这一点，我们提出了一个教师计划，通过训练卷积网络来学习表示形式，以重建图像的视野（BOW）表示，作为输入的同一图像的扰动版本。我们的策略对教师网络（其作用是产生弓箭目标）和学生网络（其角色是学习表示形式）以及视觉词词汇（用于弓目标）的在线更新。这个想法有效地实现了完全在线弓形指导的无监督学习。广泛的实验表明了我们基于弓箭的策略的兴趣，该策略超过了几种应用中先前最先进的方法（包括基于对比的方法）。例如，在下游任务中，诸如Pascal对象检测，Pascal分类和Place205分类，我们的方法在所有先前的无监督方法上都改进了，因此建立了与受监督的预训练的新最先进的结果，这些结果也明显更好。我们在https://github.com/valeoai/obow上提供实施代码。

Learning image representations without human supervision is an important and active research field. Several recent approaches have successfully leveraged the idea of making such a representation invariant under different types of perturbations, especially via contrastive-based instance discrimination training. Although effective visual representations should indeed exhibit such invariances, there are other important characteristics, such as encoding contextual reasoning skills, for which alternative reconstruction-based approaches might be better suited. With this in mind, we propose a teacher-student scheme to learn representations by training a convolutional net to reconstruct a bag-of-visual-words (BoW) representation of an image, given as input a perturbed version of that same image. Our strategy performs an online training of both the teacher network (whose role is to generate the BoW targets) and the student network (whose role is to learn representations), along with an online update of the visual-words vocabulary (used for the BoW targets). This idea effectively enables fully online BoW-guided unsupervised learning. Extensive experiments demonstrate the interest of our BoW-based strategy which surpasses previous state-of-the-art methods (including contrastive-based ones) in several applications. For instance, in downstream tasks such Pascal object detection, Pascal classification and Places205 classification, our method improves over all prior unsupervised approaches, thus establishing new state-of-the-art results that are also significantly better even than those of supervised pre-training. We provide the implementation code at https://github.com/valeoai/obow.

下载PDF全文

下载文献需遵守相关版权规定

论文标题