指数和高斯内核密度估计的最佳多项式近似值

论文标题

指数和高斯内核密度估计的最佳多项式近似值

Optimal-Degree Polynomial Approximations for Exponentials and Gaussian Kernel Density Estimation

论文作者

Aggarwal, Amol, Alman, Josh

论文摘要

对于任何实数$ b \ ge 1 $和$δ\ in（0，1）$和函数$ f：[0，b] \ rightarrow \ rightarrow \ mathbb {r} $，让$ d_ {b; δ}（f）\ in \ mathbb {z} _ {> 0} $表示多项式$ p（x）$满足$ \ sup_ {x \ in [0，b]} \ big | p（x）-f（x）\ big | <δ$。在本文中，我们为$ d_ {b;提供精确的渐近造型。 δ}（e^{ - x}）$和$ d_ {b; δ}（e^{x}）$在$ b $和$δ$方面，改善了先前已知的上限和下限。特别是，我们显示$$ d_ {b; δ}（e^{ - x}）=θ\ left（\ max \ lesg \ left \ {\ sqrt {b \ log（δ^{ - 1}）}，\ frac {\ frac {\ log（δ^{ - 1}}} \ right \} \ right），\ text {and} $$ $$ d_ {b; Δ}（e^{x}）=θ\ left（\ max \ left \ {b，\ frac {\ log（δ^{ - 1}）} {\ log（b^{ - 1} \ log（Δ^^{ - 1}） $ e^{ - x} $和$ e^x $的多项式近似具有用于许多问题的算法设计的应用，我们的学位界限既显示了这些算法的功率和局限性。我们特别关注$ n $样本点的批处理高斯内核密度估计问题，$θ（\ log n）$ dimensions带有错误$δ= n^= n^{ - θ（1）} $。我们表明，运行时间可以达到的时间取决于点集的直径的平方，$ b $，在$ b =θ（\ log n）$ the $ d_ {b; b =θ（\ log n）$中的过渡; δ}（e^{ - x}）$： - 当$ b = o（\ log n）$时，我们给出了时间$ n^{1 + o（1）} $的第一个算法。 - 当小常数$κ> 0 $的$ b =κ\ log n $时，我们给出了一种算法$ n^{1 + o（\ log \ log \ log \ logκ^{ - 1} /\ logBOGBONK^{ - 1}）} $。指数中的$ \ log \ log \ logK^{ - 1} /\ logK^{ - 1} $项来自分析我们计算$ d_ {b; δ}（e^{ - x}）$。 - 当$ b =ω（\ log n）$时，我们表明假设Seth的时间$ n^{2 -o（1）} $是必要的。

For any real numbers $B \ge 1$ and $δ\in (0, 1)$ and function $f: [0, B] \rightarrow \mathbb{R}$, let $d_{B; δ} (f) \in \mathbb{Z}_{> 0}$ denote the minimum degree of a polynomial $p(x)$ satisfying $\sup_{x \in [0, B]} \big| p(x) - f(x) \big| < δ$. In this paper, we provide precise asymptotics for $d_{B; δ} (e^{-x})$ and $d_{B; δ} (e^{x})$ in terms of both $B$ and $δ$, improving both the previously known upper bounds and lower bounds. In particular, we show $$d_{B; δ} (e^{-x}) = Θ\left( \max \left\{ \sqrt{B \log(δ^{-1})}, \frac{\log(δ^{-1}) }{ \log(B^{-1} \log(δ^{-1}))} \right\}\right), \text{ and}$$ $$d_{B; δ} (e^{x}) = Θ\left( \max \left\{ B, \frac{\log(δ^{-1}) }{ \log(B^{-1} \log(δ^{-1}))} \right\}\right).$$ Polynomial approximations for $e^{-x}$ and $e^x$ have applications to the design of algorithms for many problems, and our degree bounds show both the power and limitations of these algorithms. We focus in particular on the Batch Gaussian Kernel Density Estimation problem for $n$ sample points in $Θ(\log n)$ dimensions with error $δ= n^{-Θ(1)}$. We show that the running time one can achieve depends on the square of the diameter of the point set, $B$, with a transition at $B = Θ(\log n)$ mirroring the corresponding transition in $d_{B; δ} (e^{-x})$: - When $B=o(\log n)$, we give the first algorithm running in time $n^{1 + o(1)}$. - When $B = κ\log n$ for a small constant $κ>0$, we give an algorithm running in time $n^{1 + O(\log \log κ^{-1} /\log κ^{-1})}$. The $\log \log κ^{-1} /\log κ^{-1}$ term in the exponent comes from analyzing the behavior of the leading constant in our computation of $d_{B; δ} (e^{-x})$. - When $B = ω(\log n)$, we show that time $n^{2 - o(1)}$ is necessary assuming SETH.

下载PDF全文

下载文献需遵守相关版权规定

论文标题