论文标题

UU检验,用于单峰数据的统计建模

The UU-test for Statistical Modeling of Unimodal Data

论文作者

Chasani, Paraskevi, Likas, Aristidis

论文摘要

在数据分析和统计建模中,确定数据集的非偶谱性是一个重要的问题。它允许获取有关数据集结构的知识,即。数据点是否是通过单个或多个峰的概率分布生成的。这种知识对于多个数据分析问题非常有用,例如确定簇数和确定单峰预测。我们提出了一种称为UU检验的技术(单峰均匀测试),以决定一维数据集的单峰性。该方法在数据集的经验累积密度函数(ECDF)上运行。它试图构建eCDF的分段线性近似,该线性近似是单峰的,并在与每个线性段相对应的数据遵循统一分布的意义上进行了充分建模数据。这种方法的一个独特特征是,在单型号的情况下,它还以均匀混合模型的形式提供了数据的统计模型。我们提出了实验结果,以评估该方法决定单型号并与众所周知的DIP测试方法进行比较的能力。此外,在单峰数据集的情况下,我们使用测试集对数基因段和两样本Kolmogorov-Smirnov(KS)测试评估了建议方法提供的均匀混合模型。

Deciding on the unimodality of a dataset is an important problem in data analysis and statistical modeling. It allows to obtain knowledge about the structure of the dataset, ie. whether data points have been generated by a probability distribution with a single or more than one peaks. Such knowledge is very useful for several data analysis problems, such as for deciding on the number of clusters and determining unimodal projections. We propose a technique called UU-test (Unimodal Uniform test) to decide on the unimodality of a one-dimensional dataset. The method operates on the empirical cumulative density function (ecdf) of the dataset. It attempts to build a piecewise linear approximation of the ecdf that is unimodal and models the data sufficiently in the sense that the data corresponding to each linear segment follows the uniform distribution. A unique feature of this approach is that in the case of unimodality, it also provides a statistical model of the data in the form of a Uniform Mixture Model. We present experimental results in order to assess the ability of the method to decide on unimodality and perform comparisons with the well-known dip-test approach. In addition, in the case of unimodal datasets we evaluate the Uniform Mixture Models provided by the proposed method using the test set log-likelihood and the two-sample Kolmogorov-Smirnov (KS) test.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源