论文标题
自我进化的聚类
Self-Evolutionary Clustering
论文作者
论文摘要
深度聚类通过相互促进表示和群集分配优于常规聚类。但是,大多数现有的深层聚类方法都有两个主要缺点。首先,大多数群集分配方法基于简单的距离比较,并且高度依赖于手工制作的非线性映射生成的目标分布。这些事实在很大程度上限制了深度聚类方法可以达到的可能性能。其次,聚集结果可以通过每个群集中错误合并的样品轻松地向错误的方向引导。现有的深层聚类方法无法区分此类样本。为了解决这些问题,构建了一种新型的模块化自我进化聚类(自我Evoc)框架,从而通过自我监督的方式通过分类来提高聚类性能。模糊理论用于评分样本成员的概率,以评估每个样本的中间聚类结果确定性。基于哪些最可靠的样本可以选择和增强。增强数据用于微调与聚类的标签的现成深网分类器,从而导致模型生成目标分布。所提出的框架可以在自我监督分类器的帮助下有效区分样本异常值并产生更好的目标分布。广泛的实验表明,在三个基准数据集上,自我evoc明显优于最先进的深度聚类方法。
Deep clustering outperforms conventional clustering by mutually promoting representation learning and cluster assignment. However, most existing deep clustering methods suffer from two major drawbacks. First, most cluster assignment methods are based on simple distance comparison and highly dependent on the target distribution generated by a handcrafted nonlinear mapping. These facts largely limit the possible performance that deep clustering methods can reach. Second, the clustering results can be easily guided towards wrong direction by the misassigned samples in each cluster. The existing deep clustering methods are incapable of discriminating such samples. To address these issues, a novel modular Self-Evolutionary Clustering (Self-EvoC) framework is constructed, which boosts the clustering performance by classification in a self-supervised manner. Fuzzy theory is used to score the sample membership with probability which evaluates the intermediate clustering result certainty of each sample. Based on which, the most reliable samples can be selected and augmented. The augmented data are employed to fine-tune an off-the-shelf deep network classifier with the labels from the clustering, which results in a model to generate the target distribution. The proposed framework can efficiently discriminate sample outliers and generate better target distribution with the assistance of self-supervised classifier. Extensive experiments indicate that the Self-EvoC remarkably outperforms state-of-the-art deep clustering methods on three benchmark datasets.