论文标题
深度聚类,具有自我监督预处理的特征
Deep Clustering with Features from Self-Supervised Pretraining
论文作者
论文摘要
从概念上讲,深层聚类模型是由映射数据指向潜在空间的特征提取器的,以及将数据指向潜在空间中簇的聚类头。尽管这两个组件曾经以端到端的方式共同培训,但最近的作品证明,在两个阶段分别训练它们是有益的。在第一阶段,通过自我监督的学习对特征提取器进行了训练,从而可以保留数据点之间的群集结构。为了更好地保护群集结构,我们建议通过通过自我监督的学习在更大的数据集上鉴定的另一个模型来代替第一阶段。该方法很简单,可能会遭受域转移的影响。尽管如此,我们从经验上表明,它可以实现出色的聚类性能。当视觉变压器(VIT)结构用于特征提取时,我们的方法分别达到了CIFAR-10,CIFAR-100和STL-10的聚类精度94.0%,55.6%和97.9%。相应的先前最新结果为84.3%,47.7%和80.8%。我们的代码将在本文发布后在线提供。
A deep clustering model conceptually consists of a feature extractor that maps data points to a latent space, and a clustering head that groups data points into clusters in the latent space. Although the two components used to be trained jointly in an end-to-end fashion, recent works have proved it beneficial to train them separately in two stages. In the first stage, the feature extractor is trained via self-supervised learning, which enables the preservation of the cluster structures among the data points. To preserve the cluster structures even better, we propose to replace the first stage with another model that is pretrained on a much larger dataset via self-supervised learning. The method is simple and might suffer from domain shift. Nonetheless, we have empirically shown that it can achieve superior clustering performance. When a vision transformer (ViT) architecture is used for feature extraction, our method has achieved clustering accuracy 94.0%, 55.6% and 97.9% on CIFAR-10, CIFAR-100 and STL-10 respectively. The corresponding previous state-of-the-art results are 84.3%, 47.7% and 80.8%. Our code will be available online with the publication of the paper.