论文标题

基于密度拓扑图的聚类

Clustering Based on Graph of Density Topology

论文作者

Gao, Zhangyang, Lin, Haitao, Li, Stan. Z

论文摘要

在高水平噪声中具有不均分布的数据聚类具有挑战性。目前,HDBSCAN被认为是此问题的SOTA算法。在本文中,我们根据所谓的密度拓扑图(GDT)提出了一种新型的聚类算法。 GDT共同考虑数据样本的局部和全局结构:首先,基于密度生长过程形成局部簇,并采用适当的噪声处理以及群集边界检测的策略;然后,根据连通性度量,给出了全面的拓扑图,从本地群集之间的关系估算了GDT。连通性测量相邻本地簇之间的相似性是基于本地群集而不是各个点,从而确保了其对甚至很大的噪声的稳健性。玩具和现实世界数据集的评估结果表明,GDT几乎在所有流行的数据集上都实现了SOTA的性能,并且O(NLOGN)具有较低的时间复杂性。该代码可在https://github.com/gaozhangyang/dgc.git上找到。

Data clustering with uneven distribution in high level noise is challenging. Currently, HDBSCAN is considered as the SOTA algorithm for this problem. In this paper, we propose a novel clustering algorithm based on what we call graph of density topology (GDT). GDT jointly considers the local and global structures of data samples: firstly forming local clusters based on a density growing process with a strategy for properly noise handling as well as cluster boundary detection; and then estimating a GDT from relationship between local clusters in terms of a connectivity measure, givingglobal topological graph. The connectivity, measuring similarity between neighboring local clusters, is based on local clusters rather than individual points, ensuring its robustness to even very large noise. Evaluation results on both toy and real-world datasets show that GDT achieves the SOTA performance by far on almost all the popular datasets, and has a low time complexity of O(nlogn). The code is available at https://github.com/gaozhangyang/DGC.git.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源