重型分布的异常比例稀疏平均值估计

论文标题

重型分布的异常比例稀疏平均值估计

Outlier-Robust Sparse Mean Estimation for Heavy-Tailed Distributions

论文作者

Diakonikolas, Ilias, Kane, Daniel M., Lee, Jasper C. H., Pensia, Ankit

论文摘要

我们研究了在存在稀疏性的情况下，对重尾分布的均衡平均值估计的基本任务。具体而言，鉴于少数来自高维重尾分布的损坏样本，其平均$μ$保证稀疏，目标是有效计算一个假设，该假设准确地近似于$μ$，具有很高的可能性。先前的工作已经获得了有效的算法，用于对轻尾分布的稀疏平均值估计。在这项工作中，我们为在轻度矩假设下的重尾分布提供了第一个样品效率和多项式稳健稀疏平均估计器。我们的算法使用多个样品与环境维度对数进行对数进行比较，从而实现了最佳的渐近误差。重要的是，我们方法的样本复杂性是具有失败概率$τ$的函数，具有添加$ \ log（1/τ）$依赖性的函数。我们的算法利用了算法鲁棒统计文献的基于稳定的方法，并在我们的环境中需要进行至关重要（和必要的）适应。我们的分析可能具有独立的兴趣，涉及（非光谱）分解的精致设计，用于满足某些稀疏性能的阳性半明确矩阵。

We study the fundamental task of outlier-robust mean estimation for heavy-tailed distributions in the presence of sparsity. Specifically, given a small number of corrupted samples from a high-dimensional heavy-tailed distribution whose mean $μ$ is guaranteed to be sparse, the goal is to efficiently compute a hypothesis that accurately approximates $μ$ with high probability. Prior work had obtained efficient algorithms for robust sparse mean estimation of light-tailed distributions. In this work, we give the first sample-efficient and polynomial-time robust sparse mean estimator for heavy-tailed distributions under mild moment assumptions. Our algorithm achieves the optimal asymptotic error using a number of samples scaling logarithmically with the ambient dimension. Importantly, the sample complexity of our method is optimal as a function of the failure probability $τ$, having an additive $\log(1/τ)$ dependence. Our algorithm leverages the stability-based approach from the algorithmic robust statistics literature, with crucial (and necessary) adaptations required in our setting. Our analysis may be of independent interest, involving the delicate design of a (non-spectral) decomposition for positive semi-definite matrices satisfying certain sparsity properties.

下载PDF全文

下载文献需遵守相关版权规定

论文标题