论文标题
在多种多样的快速,准确的最佳运输
Geodesic Sinkhorn for Fast and Accurate Optimal Transport on Manifolds
论文作者
论文摘要
在数据科学中,有效计算分布之间的最佳运输距离的重要性越来越重要。基于Sinkhorn的方法当前是此类计算的最新方法,但需要$ O(n^2)$计算。此外,基于sindhorn的方法通常使用数据点之间的欧几里得地面距离。但是,随着歧管结构化科学数据的普遍性,通常需要考虑地球地面距离。在这里,我们通过提出地球凹痕来解决这两个问题 - 基于在歧管图上扩散热核。值得注意的是,Geodesic sindhorn仅需要$ O(n \ log n)$计算,因为我们基于稀疏的图形laplacian近似使用Chebyshev多项式的热核。我们将我们的方法应用于从接受化学疗法的患者样品中的高维单细胞数据的几个分布的计算。特别是,我们将Barycentric距离定义为两个这样的Barycenters之间的距离。使用此定义,我们确定了与治疗对细胞数据的影响相关的最佳运输距离和路径。
Efficient computation of optimal transport distance between distributions is of growing importance in data science. Sinkhorn-based methods are currently the state-of-the-art for such computations, but require $O(n^2)$ computations. In addition, Sinkhorn-based methods commonly use an Euclidean ground distance between datapoints. However, with the prevalence of manifold structured scientific data, it is often desirable to consider geodesic ground distance. Here, we tackle both issues by proposing Geodesic Sinkhorn -- based on diffusing a heat kernel on a manifold graph. Notably, Geodesic Sinkhorn requires only $O(n\log n)$ computation, as we approximate the heat kernel with Chebyshev polynomials based on the sparse graph Laplacian. We apply our method to the computation of barycenters of several distributions of high dimensional single cell data from patient samples undergoing chemotherapy. In particular, we define the barycentric distance as the distance between two such barycenters. Using this definition, we identify an optimal transport distance and path associated with the effect of treatment on cellular data.