论文标题

快速空间自相关

Fast Spatial Autocorrelation

论文作者

Amgalan, Anar, Mujica-Parodi, Lilianne R., Skiena, Steven S.

论文摘要

在许多数据科学模型中,物理或地理位置被证明是重要的特征,因为许多自然和社会现象具有空间成分。空间自相关衡量相同现象的局部相邻观察的程度。尽管诸如Moran的$ i $和Geary的$ C $之类的统计数据广泛用于测量空间自相关,但它们的速度很慢:所有流行的方法以$ω(n^2)$时间运行,使它们无法使用大型数据集,或者是长时间的,或者具有适度的点数。我们提出了一个新的$ s_a $统计量,基于以下概念:合并附近群集对时的差异应缓慢增加,以使空间自相关的变量缓慢增加。我们给出一个线性时间算法,以计算带有输入集聚顺序的变量的$ s_a $(可在https://github.com/aamgalan/aamgalan/spatial_autocorralation上找到)。对于$ n \ 63,000美元的典型数据集,我们的$ S_A $自相关度量可以在1秒内计算,而Moran的$ i $和Geary的$ C $则可以计算2小时或更长时间。通过仿真研究,我们证明$ s_a $识别与空间依赖的模型生成的变量的空间相关性,比Moran的$ i $或Geary的$ C $更早地达到了一个数量级。最后,我们证明了$ s_a $的几种理论属性:即它的表现是真正的相关统计量,并且在加法或乘法下以常数的形式不变。

Physical or geographic location proves to be an important feature in many data science models, because many diverse natural and social phenomenon have a spatial component. Spatial autocorrelation measures the extent to which locally adjacent observations of the same phenomenon are correlated. Although statistics like Moran's $I$ and Geary's $C$ are widely used to measure spatial autocorrelation, they are slow: all popular methods run in $Ω(n^2)$ time, rendering them unusable for large data sets, or long time-courses with moderate numbers of points. We propose a new $S_A$ statistic based on the notion that the variance observed when merging pairs of nearby clusters should increase slowly for spatially autocorrelated variables. We give a linear-time algorithm to calculate $S_A$ for a variable with an input agglomeration order (available at https://github.com/aamgalan/spatial_autocorrelation). For a typical dataset of $n \approx 63,000$ points, our $S_A$ autocorrelation measure can be computed in 1 second, versus 2 hours or more for Moran's $I$ and Geary's $C$. Through simulation studies, we demonstrate that $S_A$ identifies spatial correlations in variables generated with spatially-dependent model half an order of magnitude earlier than either Moran's $I$ or Geary's $C$. Finally, we prove several theoretical properties of $S_A$: namely that it behaves as a true correlation statistic, and is invariant under addition or multiplication by a constant.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源