论文标题

列表在几乎线性时间内可解码的平均估计

List Decodable Mean Estimation in Nearly Linear Time

论文作者

Cherapanamjeri, Yeshwanth, Mohanty, Sidhanth, Yau, Morris

论文摘要

在异常值的存在下从数据中学习是统计中的一个基本问题。直到最近,尚无计算有效算法才能计算出在自然假设下甚至在一小部分异常值的情况下,在自然假设下计算高维分布的平均值。在本文中,我们考虑在存在压倒性异常值的情况下,在大多数数据集中被引入对手。只有$α<1/2 $的“ inliers”(清洁数据)分布的平均值是无法识别的。但是,在其有影响力的工作中,[CSV17]引入了多项式时间算法,通过输出$ o(1/α)$候选解决方案的简洁列表来恢复具有有界协方差的分布平均值,其中一种保证可以接近真实的分布平均值;错误校正代码理论中的“列表解码”的直接类似物。在这项工作中,我们为在同一设置中列表可解码的平均估计的算法开发出达到常量为常数理论上最佳恢复,最佳样本复杂性以及几乎线性的时间,直到维度为polyogarithmic因子。我们的概念创新是在非洞穴景观上设计一种下降样式算法,迭代地删除了最小值以生成简洁的解决方案列表。我们的运行时瓶颈是鞍点的优化,我们为其设计定制的原始双求解器,用于广义包装,并在KY-Fan Norms下覆盖SDP的SDP,这可能引起独立的兴趣。

Learning from data in the presence of outliers is a fundamental problem in statistics. Until recently, no computationally efficient algorithms were known to compute the mean of a high dimensional distribution under natural assumptions in the presence of even a small fraction of outliers. In this paper, we consider robust statistics in the presence of overwhelming outliers where the majority of the dataset is introduced adversarially. With only an $α< 1/2$ fraction of "inliers" (clean data) the mean of a distribution is unidentifiable. However, in their influential work, [CSV17] introduces a polynomial time algorithm recovering the mean of distributions with bounded covariance by outputting a succinct list of $O(1/α)$ candidate solutions, one of which is guaranteed to be close to the true distributional mean; a direct analog of 'List Decoding' in the theory of error correcting codes. In this work, we develop an algorithm for list decodable mean estimation in the same setting achieving up to constants the information theoretically optimal recovery, optimal sample complexity, and in nearly linear time up to polylogarithmic factors in dimension. Our conceptual innovation is to design a descent style algorithm on a nonconvex landscape, iteratively removing minima to generate a succinct list of solutions. Our runtime bottleneck is a saddle-point optimization for which we design custom primal dual solvers for generalized packing and covering SDP's under Ky-Fan norms, which may be of independent interest.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源