论文标题

强劲的$ k $ - 三瞬间的分布聚类

Robust $k$-means Clustering for Distributions with Two Moments

论文作者

Klochkov, Yegor, Kroshnin, Alexey, Zhivotovskiy, Nikita

论文摘要

我们考虑了$ k $ -MEANS聚类问题的强大算法,其中量化器基于$ n $独立观察构建。我们的主要结果是基于均值的非质子过量失真界的中值,这些界限在一般可分开的希尔伯特空间中存在的两个有限矩假设下。特别是,我们的结果扩展了1981年Pollard的著名渐近结果,他表明存在两个时刻足以在$ \ Mathbb {r}^d $中具有经验上最佳量化器的强烈一致性。在$ \ mathbb {r}^d $中聚类的特殊情况,在两个有界矩的情况下,我们证明了(最多可达到恒定因素)非偶然的上限和下限的过量失真,这取决于最佳量化量的光线群的概率质量。我们的边界具有次高斯的形式,并且证明基于稳健平均估计器的均匀界限版本。

We consider the robust algorithms for the $k$-means clustering problem where a quantizer is constructed based on $N$ independent observations. Our main results are median of means based non-asymptotic excess distortion bounds that hold under the two bounded moments assumption in a general separable Hilbert space. In particular, our results extend the renowned asymptotic result of Pollard, 1981 who showed that the existence of two moments is sufficient for strong consistency of an empirically optimal quantizer in $\mathbb{R}^d$. In a special case of clustering in $\mathbb{R}^d$, under two bounded moments, we prove matching (up to constant factors) non-asymptotic upper and lower bounds on the excess distortion, which depend on the probability mass of the lightest cluster of an optimal quantizer. Our bounds have the sub-Gaussian form, and the proofs are based on the versions of uniform bounds for robust mean estimators.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源