论文标题
冲浪:一种简单,通用,健壮,快速分发学习算法
SURF: A Simple, Universal, Robust, Fast Distribution Learning Algorithm
论文作者
论文摘要
样本和计算有效的分布估计是统计和机器学习的基本原则。我们提出冲浪,这是一种用于通过分段多项式近似分布的算法。冲浪是:简单的,通过直接{经验概率}近似每个潜在的多项式片段{通过简单的经验性促进性插值}的近似,并使用普通的分裂和串联以合并零件;通用,以及众所周知的多项式及其化结果表明,它准确地近似一系列的共同分布。对于任何程度$ d \ le 8 $,均可估计任何分布$ \ ell_1 $ dance $ <3 $ <3 $ <3 $ <3 $ <3 $ <3 $ <3 $ <3 $ <3 $ <3 $ d $ $ d $ d $ piepEwise多项式的任何分布,从而提高了单个多项式的3次已知因子上限,而单个多项式为15的多项式具有15个;快速,使用最佳样品复杂性,在几乎样品线性时间内运行,如果给出了排序的样品,则可以平行于以下时间运行。在实验中,冲浪胜过最先进的算法。
Sample- and computationally-efficient distribution estimation is a fundamental tenet in statistics and machine learning. We present SURF, an algorithm for approximating distributions by piecewise polynomials. SURF is: simple, replacing prior complex optimization techniques by straight-forward {empirical probability} approximation of each potential polynomial piece {through simple empirical-probability interpolation}, and using plain divide-and-conquer to merge the pieces; universal, as well-known polynomial-approximation results imply that it accurately approximates a large class of common distributions; robust to distribution mis-specification as for any degree $d \le 8$, it estimates any distribution to an $\ell_1$ distance $< 3$ times that of the nearest degree-$d$ piecewise polynomial, improving known factor upper bounds of 3 for single polynomials and 15 for polynomials with arbitrarily many pieces; fast, using optimal sample complexity, running in near sample-linear time, and if given sorted samples it may be parallelized to run in sub-linear time. In experiments, SURF outperforms state-of-the art algorithms.