联合优化用于聚类和嵌入的自动编码器

论文标题

联合优化用于聚类和嵌入的自动编码器

Joint Optimization of an Autoencoder for Clustering and Embedding

论文作者

Boubekki, Ahcène, Kampffmeyer, Michael, Jenssen, Robert, Brefeld, Ulf

论文摘要

深层嵌入聚类已成为具有深神网络对象的无监督分类的一种主导方法。最受欢迎的方法的优化在训练深度自动编码器和自动装编码器嵌入的K-均值聚类之间交替。然而，惯性设置使前者从后者获得的有价值的信息中受益。在本文中，我们提出了一种同时学习自动编码器和聚类的替代方案。这是通过提供新颖的理论洞察力来实现的，我们表明某种类别的高斯混合模型（GMM）的客观函数自然可以作为一个隐藏层自动编码器的损耗函数，从而继承了GMM的内置聚类能力。该简单的神经网络（称为聚类模块）可以集成到深度自动编码器中，从而导致深层聚类模型能够共同学习聚类和嵌入。实验证实了聚类模块和高斯混合模型之间的等效性。进一步的评估肯定了我们深层体系结构的经验相关性，因为它在几个数据集上都优于相关的基准。

Deep embedded clustering has become a dominating approach to unsupervised categorization of objects with deep neural networks. The optimization of the most popular methods alternates between the training of a deep autoencoder and a k-means clustering of the autoencoder's embedding. The diachronic setting, however, prevents the former to benefit from valuable information acquired by the latter. In this paper, we present an alternative where the autoencoder and the clustering are learned simultaneously. This is achieved by providing novel theoretical insight, where we show that the objective function of a certain class of Gaussian mixture models (GMMs) can naturally be rephrased as the loss function of a one-hidden layer autoencoder thus inheriting the built-in clustering capabilities of the GMM. That simple neural network, referred to as the clustering module, can be integrated into a deep autoencoder resulting in a deep clustering model able to jointly learn a clustering and an embedding. Experiments confirm the equivalence between the clustering module and Gaussian mixture models. Further evaluations affirm the empirical relevance of our deep architecture as it outperforms related baselines on several data sets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题