LSD-C：线性可分离的深簇

论文标题

LSD-C：线性可分离的深簇

LSD-C: Linearly Separable Deep Clusters

论文作者

Rebuffi, Sylvestre-Alvise, Ehrhardt, Sebastien, Han, Kai, Vedaldi, Andrea, Zisserman, Andrew

论文摘要

我们提出了LSD-C，这是一种识别未标记数据集中簇的新方法。我们的算法首先基于相似性度量，在Minibatch的样本之间在特征空间中建立成对连接。然后，它在簇中重新组合了连接的样品，并在簇之间实施线性分离。这是通过将成对连接用作目标以及二进制跨凝结损失的预测来实现的。这样，网络的特征表示形式将进化，以使此特征空间中的相似样本将属于相同的线性分离群集。我们的方法从最近的半监督学习实践中汲取灵感，并建议将我们的聚类算法与自我监督的预读和强大的数据增强相结合。我们表明，我们的方法在包括CIFAR 10/100，STL 10和MNIST（MNIST）以及文档分类数据集Reuters 10K（包括CIFAR）基准（包括CIFAR 10/100，STL 10和MNIST）上的竞争对手大大优于竞争对手。

We present LSD-C, a novel method to identify clusters in an unlabeled dataset. Our algorithm first establishes pairwise connections in the feature space between the samples of the minibatch based on a similarity metric. Then it regroups in clusters the connected samples and enforces a linear separation between clusters. This is achieved by using the pairwise connections as targets together with a binary cross-entropy loss on the predictions that the associated pairs of samples belong to the same cluster. This way, the feature representation of the network will evolve such that similar samples in this feature space will belong to the same linearly separated cluster. Our method draws inspiration from recent semi-supervised learning practice and proposes to combine our clustering algorithm with self-supervised pretraining and strong data augmentation. We show that our approach significantly outperforms competitors on popular public image benchmarks including CIFAR 10/100, STL 10 and MNIST, as well as the document classification dataset Reuters 10K.

下载PDF全文

下载文献需遵守相关版权规定

论文标题