论文标题

Wasserstein原型分析

Wasserstein Archetypal Analysis

论文作者

Craig, Katy, Osting, Braxton, Wang, Dong, Xu, Yiming

论文摘要

原型分析是一种无监督的机器学习方法,该方法使用凸层来汇总数据。在其原始配方中,对于固定k,该方法找到了带有k顶点的凸层,称为原型点,使得多层包含在数据的凸壳中,并且数据和多型物之间的平均平方平方欧几里德距离是最小的。 在目前的工作中,我们考虑了基于Wasserstein公制的原型分析的替代表述,我们称之为Wasserstein Archetypal Analysis(WAA)。在一个维度上,存在一个独特的WAA解决方案,并且在两个维度上,只要数据分布与Lebesgue度量绝对连续,我们证明了解决方案的存在。我们讨论将结果扩展到更高维度和一般数据分布的障碍。然后,我们通过Renyi熵引入了问题的适当正规化,这使我们能够在任意维度上获得常规数据分布的正则化问题的解决方案。我们证明了正规化问题的一致性结果,确保如果数据是概率度量的IID样本,则随着样本数量的增加,原型点的子序列将收敛到限制数据分布的原型点,几乎可以肯定。最后,我们基于Wasserstein Metric的半分化公式,为二维问题开发并实施了一种基于梯度的计算方法。我们的分析得到了详细的计算实验的支持。

Archetypal analysis is an unsupervised machine learning method that summarizes data using a convex polytope. In its original formulation, for fixed k, the method finds a convex polytope with k vertices, called archetype points, such that the polytope is contained in the convex hull of the data and the mean squared Euclidean distance between the data and the polytope is minimal. In the present work, we consider an alternative formulation of archetypal analysis based on the Wasserstein metric, which we call Wasserstein archetypal analysis (WAA). In one dimension, there exists a unique solution of WAA and, in two dimensions, we prove existence of a solution, as long as the data distribution is absolutely continuous with respect to Lebesgue measure. We discuss obstacles to extending our result to higher dimensions and general data distributions. We then introduce an appropriate regularization of the problem, via a Renyi entropy, which allows us to obtain existence of solutions of the regularized problem for general data distributions, in arbitrary dimensions. We prove a consistency result for the regularized problem, ensuring that if the data are iid samples from a probability measure, then as the number of samples is increased, a subsequence of the archetype points converges to the archetype points for the limiting data distribution, almost surely. Finally, we develop and implement a gradient-based computational approach for the two-dimensional problem, based on the semi-discrete formulation of the Wasserstein metric. Our analysis is supported by detailed computational experiments.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源