论文标题

相似比较的几何形状

Geometry of Similarity Comparisons

论文作者

Tabaghi, Puoya, Peng, Jianhao, Milenkovic, Olgica, Dokmanić, Ivan

论文摘要

许多数据分析问题可以作为\ emph {空间形式}中的距离几何问题 - 欧几里得,球形或双曲线空间。通常,绝对距离测量通常是不可靠的或根本不可用的,并且只有相似性形式的绝对距离代理。因此,我们询问以下内容:仅给定\ emph {比较}在一组实体之间相似之处,关于基础空间形式的几何形状可以说什么?为了研究这个问题,我们介绍了目标空间形式的\ textit {序词}的概念,以及相似性测量值的\ emph {orph {orph {orph {orph {orph {orph {orph {orph。后者是测量中复杂模式的指标,而前者则量化了空间形式的容量,以适应​​具有特定序数分布的一组测量值。我们证明,空间形式的顺序能力与其尺寸和曲率的符号有关。这导致在欧几里得和球形嵌入尺寸上的下限我们称为相似性图。更重要的是,我们表明,在相似性图上定义的顺序扩散随机变量的统计行为可用于识别其基础空间形式。我们通过对加权树,单细胞RNA表达数据和球形制图测量的实验来支持我们的理论主张。

Many data analysis problems can be cast as distance geometry problems in \emph{space forms} -- Euclidean, spherical, or hyperbolic spaces. Often, absolute distance measurements are often unreliable or simply unavailable and only proxies to absolute distances in the form of similarities are available. Hence we ask the following: Given only \emph{comparisons} of similarities amongst a set of entities, what can be said about the geometry of the underlying space form? To study this question, we introduce the notions of the \textit{ordinal capacity} of a target space form and \emph{ordinal spread} of the similarity measurements. The latter is an indicator of complex patterns in the measurements, while the former quantifies the capacity of a space form to accommodate a set of measurements with a specific ordinal spread profile. We prove that the ordinal capacity of a space form is related to its dimension and the sign of its curvature. This leads to a lower bound on the Euclidean and spherical embedding dimension of what we term similarity graphs. More importantly, we show that the statistical behavior of the ordinal spread random variables defined on a similarity graph can be used to identify its underlying space form. We support our theoretical claims with experiments on weighted trees, single-cell RNA expression data and spherical cartographic measurements.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源