快速密度峰聚类：基于多回路的平行化方法

论文标题

快速密度峰聚类：基于多回路的平行化方法

Fast Density-Peaks Clustering: Multicore-based Parallelization Approach

论文作者

Amagata, Daichi, Hara, Takahiro

论文摘要

聚类多维点是许多领域的基本任务，基于密度的聚类支持许多应用程序，因为它可以发现任意形状的群集。本文解决了密度峰聚类（DPC）的问题，这是一个最近提出的基于密度的聚类框架。尽管DPC已经有很多应用程序，但其直接实现将二次计算构成给定数据集中的点数，从而无法扩展到大型数据集。要在大型数据集上启用DPC，我们提出了DPC的有效算法。具体而言，我们提出了一种精确的算法，EX-DPC和两种近似算法，即DPC和S-AppRox-DPC。在关于DPC参数的合理假设下，我们的算法是下季度的，即打破二次屏障。此外，大约DPC不需要任何其他参数，并且可以返回与EX-DPC相同的群集中心，从而呈现出准确的聚类结果。 S-Approx-DPC需要近似参数，但可以加快其计算效率。我们进一步表明，他们的效率可以通过利用多核心处理来加速。我们使用合成数据集进行了广泛的实验，我们的实验结果表明我们的算法是有效，可扩展和准确的。

Clustering multi-dimensional points is a fundamental task in many fields, and density-based clustering supports many applications as it can discover clusters of arbitrary shapes. This paper addresses the problem of Density-Peaks Clustering (DPC), a recently proposed density-based clustering framework. Although DPC already has many applications, its straightforward implementation incurs a quadratic time computation to the number of points in a given dataset, thereby does not scale to large datasets. To enable DPC on large datasets, we propose efficient algorithms for DPC. Specifically, we propose an exact algorithm, Ex-DPC, and two approximation algorithms, Approx-DPC and S-Approx-DPC. Under a reasonable assumption about a DPC parameter, our algorithms are sub-quadratic, i.e., break the quadratic barrier. Besides, Approx-DPC does not require any additional parameters and can return the same cluster centers as those of Ex-DPC, rendering an accurate clustering result. S-Approx-DPC requires an approximation parameter but can speed up its computational efficiency. We further present that their efficiencies can be accelerated by leveraging multicore processing. We conduct extensive experiments using synthetic and real datasets, and our experimental results demonstrate that our algorithms are efficient, scalable, and accurate.

下载PDF全文

下载文献需遵守相关版权规定

论文标题