论文标题
隐藏的统一群集在自我监督学习中
The Hidden Uniform Cluster Prior in Self-Supervised Learning
论文作者
论文摘要
在表示学习方面,成功的范式是使用基于迷你批量统计的任务(例如Simclr,Vicreg,Swav,MSN)执行自我监督的预处理。我们表明,在所有这些方法的公式中,在学习功能之前被忽略了,可以使数据统一聚类。尽管在诸如ImageNet之类的类平衡数据上进行预处理时,这一先验导致了出色的语义表示,但我们证明,在对类别不平衡的数据进行预处理时,它可能会妨碍性能。通过远离常规统一的先验,而偏爱幂律分布式特征群集,我们表明可以提高现实世界中型型数据集中学习的表示的质量。为了证明这一点,我们开发了蒙面暹罗网络(MSN)方法的扩展,以支持使用任意特征先验的使用。
A successful paradigm in representation learning is to perform self-supervised pretraining using tasks based on mini-batch statistics (e.g., SimCLR, VICReg, SwAV, MSN). We show that in the formulation of all these methods is an overlooked prior to learn features that enable uniform clustering of the data. While this prior has led to remarkably semantic representations when pretraining on class-balanced data, such as ImageNet, we demonstrate that it can hamper performance when pretraining on class-imbalanced data. By moving away from conventional uniformity priors and instead preferring power-law distributed feature clusters, we show that one can improve the quality of the learned representations on real-world class-imbalanced datasets. To demonstrate this, we develop an extension of the Masked Siamese Networks (MSN) method to support the use of arbitrary features priors.