论文标题

通过正方和的强大稀疏平均估计

Robust Sparse Mean Estimation via Sum of Squares

论文作者

Diakonikolas, Ilias, Kane, Daniel M., Karmalkar, Sushrut, Pensia, Ankit, Pittas, Thanasis

论文摘要

我们研究了在存在$ε$ - 对抗异常值的存在下,高维稀疏平均估计的问题。先前的工作为此任务获得了该任务的样本和计算有效的算法,以供身份协方差Subgaussian分布。在这项工作中,我们开发了第一个有效的算法,用于稳健的稀疏平均估计,而没有对协方差的先验知识。对于$ \ Mathbb r^d $上的分布,并带有“认证” $ t $ T $ - 瞬间和足够轻的尾巴,我们的算法达到了$ O(ε^{1-1/T})$的错误,带有样品复杂性$ M =(k \ log log(k \ log(k \ log(d))^{o(t)}}/i^^$^2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2对于高斯分布的特殊情况,我们的算法具有$ \ tilde o(ε)$的接近最佳误差,带有样品复杂性$ m = o(k^4 \ mathrm {polylog}(d)(d))/ε^2 $。我们的算法遵循基于平方的总和,对算法方法的证明。我们通过统计查询和低度多项式测试的下限对上限进行补充,提供了证据,表明我们算法实现的样本时间 - 错误权衡在定性上是最好的。

We study the problem of high-dimensional sparse mean estimation in the presence of an $ε$-fraction of adversarial outliers. Prior work obtained sample and computationally efficient algorithms for this task for identity-covariance subgaussian distributions. In this work, we develop the first efficient algorithms for robust sparse mean estimation without a priori knowledge of the covariance. For distributions on $\mathbb R^d$ with "certifiably bounded" $t$-th moments and sufficiently light tails, our algorithm achieves error of $O(ε^{1-1/t})$ with sample complexity $m = (k\log(d))^{O(t)}/ε^{2-2/t}$. For the special case of the Gaussian distribution, our algorithm achieves near-optimal error of $\tilde O(ε)$ with sample complexity $m = O(k^4 \mathrm{polylog}(d))/ε^2$. Our algorithms follow the Sum-of-Squares based, proofs to algorithms approach. We complement our upper bounds with Statistical Query and low-degree polynomial testing lower bounds, providing evidence that the sample-time-error tradeoffs achieved by our algorithms are qualitatively the best possible.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源