论文标题
通过危险率测试分布的尾部重量
Testing Tail Weight of a Distribution Via Hazard Rate
论文作者
论文摘要
了解数据分布的形状是各种领域中人们感兴趣的,因为这可能会影响用于该数据的算法的类型。我们在分配属性测试的框架中研究了一个这样的问题,该问题表征了区分分布是否具有一定属性或远离具有该属性的样本的数量。特别是,从分布中给定样品,我们试图表征分布的尾巴,也就是说,很少出现多少元素。我们基于仔细的存储桶方案开发了一种算法,该方案将灯塔分布与非光尾分布区分开,在自然平稳性和订购假设下,基于危险率的定义。我们绑定了该测试所需的样本数量,即就问题的参数而具有很高的概率成功,这表明它在这些参数中是多项式。此外,我们证明了一个硬度结果,暗示没有任何假设就无法解决此问题。
Understanding the shape of a distribution of data is of interest to people in a great variety of fields, as it may affect the types of algorithms used for that data. We study one such problem in the framework of distribution property testing, characterizing the number of samples required to to distinguish whether a distribution has a certain property or is far from having that property. In particular, given samples from a distribution, we seek to characterize the tail of the distribution, that is, understand how many elements appear infrequently. We develop an algorithm based on a careful bucketing scheme that distinguishes light-tailed distributions from non-light-tailed ones with respect to a definition based on the hazard rate, under natural smoothness and ordering assumptions. We bound the number of samples required for this test to succeed with high probability in terms of the parameters of the problem, showing that it is polynomial in these parameters. Further, we prove a hardness result that implies that this problem cannot be solved without any assumptions.