测试数据binnings

论文标题

测试数据binnings

Testing Data Binnings

论文作者

Canonne, Clément L., Wimmer, Karl

论文摘要

由数据量化问题和“ binning”的问题，我们重新审视了离散概率分布的身份测试问题。身份测试（又称一个样本测试），一个基本和现在的分配测试问题，问，给定参考分布（型号）$ \ mathbf {q} $，以及来自未知分布$ \ mathbf {p {p {p} $的样本，均超过$ [n]等于$ \ mathbf {q} $，或与之显着不同。 In this paper, we introduce the related question of 'identity up to binning,' where the reference distribution $\mathbf{q}$ is over $k \ll n$ elements: the question is then whether there exists a suitable binning of the domain $[n]$ into $k$ intervals such that, once "binned," $\mathbf{p}$ is equal to $\mathbf{q}$.我们在这个新问题的样本复杂性上提供了几乎紧密的上限和下限，这在Vanilla身份测试中既显示了定量和定性的差异，又回答了Canonne（2019）的开放问题。最后，我们讨论了几个扩展和相关的研究方向。

Motivated by the question of data quantization and "binning," we revisit the problem of identity testing of discrete probability distributions. Identity testing (a.k.a. one-sample testing), a fundamental and by now well-understood problem in distribution testing, asks, given a reference distribution (model) $\mathbf{q}$ and samples from an unknown distribution $\mathbf{p}$, both over $[n]=\{1,2,\dots,n\}$, whether $\mathbf{p}$ equals $\mathbf{q}$, or is significantly different from it. In this paper, we introduce the related question of 'identity up to binning,' where the reference distribution $\mathbf{q}$ is over $k \ll n$ elements: the question is then whether there exists a suitable binning of the domain $[n]$ into $k$ intervals such that, once "binned," $\mathbf{p}$ is equal to $\mathbf{q}$. We provide nearly tight upper and lower bounds on the sample complexity of this new question, showing both a quantitative and qualitative difference with the vanilla identity testing one, and answering an open question of Canonne (2019). Finally, we discuss several extensions and related research directions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题