论文标题
欧几里得空间中分层结构的无监督嵌入
Unsupervised Embedding of Hierarchical Structure in Euclidean Space
论文作者
论文摘要
深层嵌入方法影响了无监督学习的许多领域。但是,学习层次结构的最佳方法使用非欧国人表示,而欧几里得几何形状是许多分层聚类算法背后的理论。为了弥合这两个区域之间的差距,我们考虑学习将数据嵌入到欧几里得空间中,以改善由聚集算法产生的分层聚类。为了学习嵌入,我们先前使用具有高斯混合物的各种自动编码器进行重新访问,我们表明,重新缩放潜在空间嵌入,然后应用Ward的基于Ward的基于链接的算法可提高树状图纯度和Moseley-Wang成本功能的结果。最后,我们通过对这种方法成功的理论解释来补充我们的经验结果。我们研究了嵌入式载体的合成模型,并证明Ward的方法准确地恢复了种植的分层聚类,并具有很高的可能性。
Deep embedding methods have influenced many areas of unsupervised learning. However, the best methods for learning hierarchical structure use non-Euclidean representations, whereas Euclidean geometry underlies the theory behind many hierarchical clustering algorithms. To bridge the gap between these two areas, we consider learning a non-linear embedding of data into Euclidean space as a way to improve the hierarchical clustering produced by agglomerative algorithms. To learn the embedding, we revisit using a variational autoencoder with a Gaussian mixture prior, and we show that rescaling the latent space embedding and then applying Ward's linkage-based algorithm leads to improved results for both dendrogram purity and the Moseley-Wang cost function. Finally, we complement our empirical results with a theoretical explanation of the success of this approach. We study a synthetic model of the embedded vectors and prove that Ward's method exactly recovers the planted hierarchical clustering with high probability.