论文标题
测试数据binnings
Testing Data Binnings
论文作者
论文摘要
由数据量化问题和“ binning”的问题,我们重新审视了离散概率分布的身份测试问题。身份测试(又称一个样本测试),一个基本和现在的分配测试问题,问,给定参考分布(型号)$ \ mathbf {q} $,以及来自未知分布$ \ mathbf {p {p {p} $的样本,均超过$ [n]等于$ \ mathbf {q} $,或与之显着不同。 In this paper, we introduce the related question of 'identity up to binning,' where the reference distribution $\mathbf{q}$ is over $k \ll n$ elements: the question is then whether there exists a suitable binning of the domain $[n]$ into $k$ intervals such that, once "binned," $\mathbf{p}$ is equal to $\mathbf{q}$.我们在这个新问题的样本复杂性上提供了几乎紧密的上限和下限,这在Vanilla身份测试中既显示了定量和定性的差异,又回答了Canonne(2019)的开放问题。最后,我们讨论了几个扩展和相关的研究方向。
Motivated by the question of data quantization and "binning," we revisit the problem of identity testing of discrete probability distributions. Identity testing (a.k.a. one-sample testing), a fundamental and by now well-understood problem in distribution testing, asks, given a reference distribution (model) $\mathbf{q}$ and samples from an unknown distribution $\mathbf{p}$, both over $[n]=\{1,2,\dots,n\}$, whether $\mathbf{p}$ equals $\mathbf{q}$, or is significantly different from it. In this paper, we introduce the related question of 'identity up to binning,' where the reference distribution $\mathbf{q}$ is over $k \ll n$ elements: the question is then whether there exists a suitable binning of the domain $[n]$ into $k$ intervals such that, once "binned," $\mathbf{p}$ is equal to $\mathbf{q}$. We provide nearly tight upper and lower bounds on the sample complexity of this new question, showing both a quantitative and qualitative difference with the vanilla identity testing one, and answering an open question of Canonne (2019). Finally, we discuss several extensions and related research directions.