通过残留动量学习自我监督的视觉表示学习

论文标题

通过残留动量学习自我监督的视觉表示学习

Self-Supervised Visual Representation Learning via Residual Momentum

论文作者

Pham, Trung X., Niu, Axi, Kang, Zhang, Madjid, Sultan Rizky, Hong, Ji Woo, Kim, Daehyeok, Tee, Joshua Tian Jin, Yoo, Chang D.

论文摘要

自我监督的学习（SSL）方法已显示出从未标记的数据学习表示形式的有希望的能力。其中，基于动量的框架引起了极大的关注。尽管取得了巨大的成功，但这些基于动量的SSL框架在线编码器（学生）与动量编码器（老师）之间的表示差距很大，这阻碍了下游任务的表现。本文是第一个调查和识别这一不可见差距为在现有SSL框架中被忽略的瓶颈的文章，有可能阻止模型学习良好的表示。为了解决这个问题，我们提出“残留动量”，以直接减少这一差距，以鼓励学生尽可能接近教师的代表性，缩小与老师的绩效差距，并显着改善现有的SSL。我们的方法很简单，易于实现，并且可以轻松地插入其他SSL框架中。对众多基准数据集和各种网络架构的广泛实验结果证明了我们方法对最先进的对比度学习基线的有效性。

Self-supervised learning (SSL) approaches have shown promising capabilities in learning the representation from unlabeled data. Amongst them, momentum-based frameworks have attracted significant attention. Despite being a great success, these momentum-based SSL frameworks suffer from a large gap in representation between the online encoder (student) and the momentum encoder (teacher), which hinders performance on downstream tasks. This paper is the first to investigate and identify this invisible gap as a bottleneck that has been overlooked in the existing SSL frameworks, potentially preventing the models from learning good representation. To solve this problem, we propose "residual momentum" to directly reduce this gap to encourage the student to learn the representation as close to that of the teacher as possible, narrow the performance gap with the teacher, and significantly improve the existing SSL. Our method is straightforward, easy to implement, and can be easily plugged into other SSL frameworks. Extensive experimental results on numerous benchmark datasets and diverse network architectures have demonstrated the effectiveness of our method over the state-of-the-art contrastive learning baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题