论文标题
使用局部级别集的概率,用于基于模式的聚类的包装$ k $ distance
Bagged $k$-Distance for Mode-Based Clustering Using the Probability of Localized Level Sets
论文作者
论文摘要
在本文中,我们提出了一种合奏学习算法\ textit {bagged $ k $ - 基于模式的聚类}(\ textit {bdmbc}),通过提出一个称为\ textit {局部级别}(\ textit sets}(\ textit {plls}的新测量范围{ 临界点。从理论方面来说,我们表明,在包装袋的$ k $ distance中有正确选择的最近的邻居$ k_d $,子样本大小$ s $,包装回合$ b $以及最近的邻居$ k_l $的本地化级别集合,BDMBC可以实现模式估计的最佳转换率。事实证明,使用相对较小的$ b $,子样本尺寸$ s $可能要比每个包装圈的培训数据$ n $的数量小得多,并且最接近的邻居$ k_d $可以同时减少。此外,我们在Hausdorff距离方面为PLL的水平设置估算建立了最佳的收敛结果,这表明BDMBC可以为变化密度找到局部水平集,从而享受局部适应性。在实际方面,我们进行数值实验,以经验验证BDMBC对模式估计和水平设置估计的有效性,这证明了我们提出的算法的有希望的准确性和效率。
In this paper, we propose an ensemble learning algorithm named \textit{bagged $k$-distance for mode-based clustering} (\textit{BDMBC}) by putting forward a new measurement called the \textit{probability of localized level sets} (\textit{PLLS}), which enables us to find all clusters for varying densities with a global threshold. On the theoretical side, we show that with a properly chosen number of nearest neighbors $k_D$ in the bagged $k$-distance, the sub-sample size $s$, the bagging rounds $B$, and the number of nearest neighbors $k_L$ for the localized level sets, BDMBC can achieve optimal convergence rates for mode estimation. It turns out that with a relatively small $B$, the sub-sample size $s$ can be much smaller than the number of training data $n$ at each bagging round, and the number of nearest neighbors $k_D$ can be reduced simultaneously. Moreover, we establish optimal convergence results for the level set estimation of the PLLS in terms of Hausdorff distance, which reveals that BDMBC can find localized level sets for varying densities and thus enjoys local adaptivity. On the practical side, we conduct numerical experiments to empirically verify the effectiveness of BDMBC for mode estimation and level set estimation, which demonstrates the promising accuracy and efficiency of our proposed algorithm.