论文标题
缠结聚类:算法框架和理论保证
Clustering with Tangles: Algorithmic Framework and Theoretical Guarantees
论文作者
论文摘要
最初,缠结是数学图理论中的抽象工具,以证明著名的图形次要定理。在本文中,我们展示了机器学习应用中缠结的实际潜力。给定任何数据集的切割集合,缠结将这些切口汇总为指向致密结构的方向。结果,群集的柔和特征是一组一致的指针。这种高度灵活的方法可以解决各种设置中的聚类问题,从图表中的社区检测问卷到公制空间中的聚类点。我们提出的框架的输出是分层的,并引起了软模拟图的概念,这可以帮助探索数据集的群集结构。汇总削减的计算复杂性在数据点数中是线性的。因此,缠结方法的瓶颈是生成切口,为此,简单而快速的算法形成了足够的基础。在我们的论文中,我们构建了与缠结聚类的算法框架,在各种环境中证明了理论保证,并提供了广泛的模拟和用例。 Python代码可在GitHub上找到。
Originally, tangles were invented as an abstract tool in mathematical graph theory to prove the famous graph minor theorem. In this paper, we showcase the practical potential of tangles in machine learning applications. Given a collection of cuts of any dataset, tangles aggregate these cuts to point in the direction of a dense structure. As a result, a cluster is softly characterized by a set of consistent pointers. This highly flexible approach can solve clustering problems in various setups, ranging from questionnaires over community detection in graphs to clustering points in metric spaces. The output of our proposed framework is hierarchical and induces the notion of a soft dendrogram, which can help explore the cluster structure of a dataset. The computational complexity of aggregating the cuts is linear in the number of data points. Thus the bottleneck of the tangle approach is to generate the cuts, for which simple and fast algorithms form a sufficient basis. In our paper we construct the algorithmic framework for clustering with tangles, prove theoretical guarantees in various settings, and provide extensive simulations and use cases. Python code is available on github.