论文标题
光谱群集的平均灵敏度
Average Sensitivity of Spectral Clustering
论文作者
论文摘要
光谱群集是在图中查找簇的最流行的聚类方法之一,该方法在数据挖掘中发现了许多应用程序。但是,这些应用程序中的输入图可能由于测量错误,出于隐私原因或数据转换任意性而缺少许多边缘。为了基于光谱聚类做出可靠,有效的决策,我们使用平均灵敏度的概念评估了在输入图中对边缘扰动的光谱聚类的稳定性,这是我们在随机删除边缘之前和之后的输出簇对称差异的预期大小。 我们首先证明,光谱聚类的平均灵敏度与$λ_2/λ_3^2 $成正比,其中$λ_i$是(归一化)laplacian的$ i $ $ i $ thest最小特征值。我们还证明了一个类似的限制,以$ k $ - 频谱聚类,将图表分为$ k $簇。然后,我们通过对合成和真实网络进行实验来从经验上证实我们的理论界限。我们的结果表明,当输入图中存在群集结构时,光谱聚类在边缘扰动上是稳定的。
Spectral clustering is one of the most popular clustering methods for finding clusters in a graph, which has found many applications in data mining. However, the input graph in those applications may have many missing edges due to error in measurement, withholding for a privacy reason, or arbitrariness in data conversion. To make reliable and efficient decisions based on spectral clustering, we assess the stability of spectral clustering against edge perturbations in the input graph using the notion of average sensitivity, which is the expected size of the symmetric difference of the output clusters before and after we randomly remove edges. We first prove that the average sensitivity of spectral clustering is proportional to $λ_2/λ_3^2$, where $λ_i$ is the $i$-th smallest eigenvalue of the (normalized) Laplacian. We also prove an analogous bound for $k$-way spectral clustering, which partitions the graph into $k$ clusters. Then, we empirically confirm our theoretical bounds by conducting experiments on synthetic and real networks. Our results suggest that spectral clustering is stable against edge perturbations when there is a cluster structure in the input graph.