论文标题

使用内核和几何图在拓扑空间上测量关联

Measuring Association on Topological Spaces Using Kernels and Geometric Graphs

论文作者

Deb, Nabarun, Ghosal, Promit, Sen, Bodhisattva

论文摘要

在本文中,我们提出并研究了两个随机变量$ x $和$ y $之间的一类简单,非参数但可解释的关联度量。这些非参数措施 - 使用复制内核希尔伯特空间的理论定义 - 捕获了$ x $和$ y $之间的依赖强度,并且在变量是独立的时,当一个变量是另一个变量是另一个变量的函数时,才具有0的属性。此外,可以使用图形功能的一般框架始终估算这些人群度量,这些框架包括$ k $ - 最近的邻居图和最小跨越树。此外,这些估计量的子类也被证明可以适应潜在分布的内在维度。这些经验措施中的一些也可以在几乎线性时间内计算。在$ x $和$ y $之间的独立性假设下,这些经验措施(适当归一化)具有标准的正常限制分布。因此,这些措施也可以很容易地用于测试$ x $和$ y $之间相互独立性的假设。实际上,据我们所知,这些是具有上述所有理想属性的唯一程序。此外,在限制欧几里得空间时,在独立性的假设下,我们可以使用通过最佳运输理论定义的多变量等级来对这些样本进行有限样本分布。 Dette等人提出的最新相关系数。 (2013年),Chatterjee(2019)以及Azadkia和Chatterjee(2019)可以看作是这种一般措施的特殊情况。

In this paper we propose and study a class of simple, nonparametric, yet interpretable measures of association between two random variables $X$ and $Y$ taking values in general topological spaces. These nonparametric measures -- defined using the theory of reproducing kernel Hilbert spaces -- capture the strength of dependence between $X$ and $Y$ and have the property that they are 0 if and only if the variables are independent and 1 if and only if one variable is a measurable function of the other. Further, these population measures can be consistently estimated using the general framework of graph functionals which include $k$-nearest neighbor graphs and minimum spanning trees. Moreover, a sub-class of these estimators are also shown to adapt to the intrinsic dimensionality of the underlying distribution. Some of these empirical measures can also be computed in near linear time. Under the hypothesis of independence between $X$ and $Y$, these empirical measures (properly normalized) have a standard normal limiting distribution. Thus, these measures can also be readily used to test the hypothesis of mutual independence between $X$ and $Y$. In fact, as far as we are aware, these are the only procedures that possess all the above mentioned desirable properties. Furthermore, when restricting to Euclidean spaces, we can make these sample measures of association finite-sample distribution-free, under the hypothesis of independence, by using multivariate ranks defined via the theory of optimal transport. The recent correlation coefficient proposed in Dette et al. (2013), Chatterjee (2019), and Azadkia and Chatterjee (2019) can be seen as a special case of this general class of measures.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源