冲浪：一种简单，通用，健壮，快速分发学习算法

论文标题

冲浪：一种简单，通用，健壮，快速分发学习算法

SURF: A Simple, Universal, Robust, Fast Distribution Learning Algorithm

论文作者

Hao, Yi, Jain, Ayush, Orlitsky, Alon, Ravindrakumar, Vaishakh

论文摘要

样本和计算有效的分布估计是统计和机器学习的基本原则。我们提出冲浪，这是一种用于通过分段多项式近似分布的算法。冲浪是：简单的，通过直接{经验概率}近似每个潜在的多项式片段{通过简单的经验性促进性插值}的近似，并使用普通的分裂和串联以合并零件；通用，以及众所周知的多项式及其化结果表明，它准确地近似一系列的共同分布。对于任何程度$ d \ le 8 $，均可估计任何分布$ \ ell_1 $ dance $ <3 $ <3 $ <3 $ <3 $ <3 $ <3 $ <3 $ <3 $ <3 $ <3 $ d $ $ d $ d $ piepEwise多项式的任何分布，从而提高了单个多项式的3次已知因子上限，而单个多项式为15的多项式具有15个;快速，使用最佳样品复杂性，在几乎样品线性时间内运行，如果给出了排序的样品，则可以平行于以下时间运行。在实验中，冲浪胜过最先进的算法。

Sample- and computationally-efficient distribution estimation is a fundamental tenet in statistics and machine learning. We present SURF, an algorithm for approximating distributions by piecewise polynomials. SURF is: simple, replacing prior complex optimization techniques by straight-forward {empirical probability} approximation of each potential polynomial piece {through simple empirical-probability interpolation}, and using plain divide-and-conquer to merge the pieces; universal, as well-known polynomial-approximation results imply that it accurately approximates a large class of common distributions; robust to distribution mis-specification as for any degree $d \le 8$, it estimates any distribution to an $\ell_1$ distance $< 3$ times that of the nearest degree-$d$ piecewise polynomial, improving known factor upper bounds of 3 for single polynomials and 15 for polynomials with arbitrarily many pieces; fast, using optimal sample complexity, running in near sample-linear time, and if given sorted samples it may be parallelized to run in sub-linear time. In experiments, SURF outperforms state-of-the art algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题