内核K均值，一定程度：算法和强大的一致性

论文标题

内核K均值，一定程度：算法和强大的一致性

Kernel k-Means, By All Means: Algorithms and Strong Consistency

论文作者

Paul, Debolina, Chakraborty, Saptarshi, Das, Swagatam, Xu, Jason

论文摘要

内核$ k $ -MEANS聚类是无监督学习非线性可分离数据的强大工具。自从最早的尝试以来，研究人员指出，这种算法通常被局部最小值所困扰，这是由于基本目标函数的非跨性别性而引起的。在本文中，我们概括了最新的结果，利用了一般的手段来对抗内核和多内核设置的亚最佳本地解决方案。我们的算法称为内核功率$ k $ -means，利用大型化最小化（MM）更好地解决了这个非凸面问题。我们显示该方法在保留高效，封闭形式更新的同时，隐含地在内核特征空间中进行退火，我们严格地从计算和统计观点来表征其收敛属性。特别是，我们通过建立强大的一致性保证来表征所提出方法的大样本行为。它的优点在具有非线性和多视图分离的一组模拟数据集和真实数据基准上得到了彻底验证。

Kernel $k$-means clustering is a powerful tool for unsupervised learning of non-linearly separable data. Since the earliest attempts, researchers have noted that such algorithms often become trapped by local minima arising from non-convexity of the underlying objective function. In this paper, we generalize recent results leveraging a general family of means to combat sub-optimal local solutions to the kernel and multi-kernel settings. Called Kernel Power $k$-Means, our algorithm makes use of majorization-minimization (MM) to better solve this non-convex problem. We show the method implicitly performs annealing in kernel feature space while retaining efficient, closed-form updates, and we rigorously characterize its convergence properties both from computational and statistical points of view. In particular, we characterize the large sample behavior of the proposed method by establishing strong consistency guarantees. Its merits are thoroughly validated on a suite of simulated datasets and real data benchmarks that feature non-linear and multi-view separation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题