论文标题
AMD-DBSCAN:用于多个密度的数据集的自适应多密度DBSCAN
AMD-DBSCAN: An Adaptive Multi-density DBSCAN for datasets of extremely variable density
论文作者
论文摘要
DBSCAN已被广泛用于基于密度的聚类算法。但是,随着对多密度聚类的需求不断增长,以前的传统DSBCAN无法在多密度数据集上具有良好的聚类结果。为了解决此问题,本文提出了一种自适应多密度DBSCAN算法(AMD-DBSCAN)。在AMD-DBSCAN中提出了改进的参数适应方法,以搜索多个参数对(即EPS和MINPTS),这是确定聚类结果和性能的关键参数,因此允许将模型应用于多密度数据集。此外,AMD-DBSCAN只需要一个超参数,以避免复杂的重复初始化操作。此外,提出了邻居数(VNN)的方差来测量每个群集之间的密度差。实验结果表明,与传统的自适应算法相比,由于算法复杂性较低,我们的AMD-DBSCAN平均将执行时间缩短了75%。此外,AMD-DBSCCAN在极度可变密度的多密度数据集上的最新设计中平均将准确性提高了24.7%,而在单密度方案中没有性能损失。我们的代码和数据集可从https://github.com/alexandrewang915/amd-dbscan获得。
DBSCAN has been widely used in density-based clustering algorithms. However, with the increasing demand for Multi-density clustering, previous traditional DSBCAN can not have good clustering results on Multi-density datasets. In order to address this problem, an adaptive Multi-density DBSCAN algorithm (AMD-DBSCAN) is proposed in this paper. An improved parameter adaptation method is proposed in AMD-DBSCAN to search for multiple parameter pairs (i.e., Eps and MinPts), which are the key parameters to determine the clustering results and performance, therefore allowing the model to be applied to Multi-density datasets. Moreover, only one hyperparameter is required for AMD-DBSCAN to avoid the complicated repetitive initialization operations. Furthermore, the variance of the number of neighbors (VNN) is proposed to measure the difference in density between each cluster. The experimental results show that our AMD-DBSCAN reduces execution time by an average of 75% due to lower algorithm complexity compared with the traditional adaptive algorithm. In addition, AMD-DBSCAN improves accuracy by 24.7% on average over the state-of-the-art design on Multi-density datasets of extremely variable density, while having no performance loss in Single-density scenarios. Our code and datasets are available at https://github.com/AlexandreWANG915/AMD-DBSCAN.