论文标题
学习对非参数两样本测试的深内核
Learning Deep Kernels for Non-Parametric Two-Sample Tests
论文作者
论文摘要
我们提出了一类基于内核的两样本测试,该测试旨在确定是否从相同的分布中绘制了两组样本。我们的测试是由通过深神经网参数构建的,经过训练以最大化测试功率。这些测试适应了分布平滑度和在空间上的形状的变化,并且特别适合于高维和复杂数据。相比之下,在先前的内核测试工作中使用的较简单内核在空间上是均匀的,并且仅在长度上进行自适应。我们解释了该方案如何将基于流行的分类器的两样本测试作为一种特殊情况,但总体上会改进它们。我们为提出的适应方法提供了第一个一致性证明,该方法既适用于深度特征的内核,又适用于简单的径向基核或多个内核学习。在实验中,我们在基准和现实世界数据上的假设测试中建立了深内核的出色性能。我们基于深核的两个样本测试的代码可在https://github.com/fengliu90/dk-for-tst上找到。
We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution. Our tests are constructed from kernels parameterized by deep neural nets, trained to maximize test power. These tests adapt to variations in distribution smoothness and shape over space, and are especially suited to high dimensions and complex data. By contrast, the simpler kernels used in prior kernel testing work are spatially homogeneous, and adaptive only in lengthscale. We explain how this scheme includes popular classifier-based two-sample tests as a special case, but improves on them in general. We provide the first proof of consistency for the proposed adaptation method, which applies both to kernels on deep features and to simpler radial basis kernels or multiple kernel learning. In experiments, we establish the superior performance of our deep kernels in hypothesis testing on benchmark and real-world data. The code of our deep-kernel-based two sample tests is available at https://github.com/fengliu90/DK-for-TST.