论文标题
具有大规模网络的空间自动进度模型的分布式估计和推断
Distributed Estimation and Inference for Spatial Autoregression Model with Large Scale Networks
论文作者
论文摘要
在线网络平台的快速增长会生成大规模网络数据,并使用空间自动化(SAR)模型对统计分析构成了巨大挑战。在这项工作中,我们为分布式系统上的SAR模型开发了一种新颖的分布式估计和统计推理框架。我们首先提出了一个分布式网络最小二乘近似(DNLSA)方法。这使我们能够通过对每个工人的本地估计器进行加权平均值来获得一步估计器。之后,精制的两步估计旨在进一步减少估计偏差。对于统计推断,我们使用一种随机投影方法来降低昂贵的通信成本。从理论上讲,我们显示了一步和两步估计器的一致性和渐近正态性。此外,我们还提供了分布式统计推理程序的理论保证。理论发现和计算优势通过在SPARK系统上实施的几个数值模拟来验证。最后,Yelp数据集的实验进一步说明了所提出的方法的有用性。
The rapid growth of online network platforms generates large-scale network data and it poses great challenges for statistical analysis using the spatial autoregression (SAR) model. In this work, we develop a novel distributed estimation and statistical inference framework for the SAR model on a distributed system. We first propose a distributed network least squares approximation (DNLSA) method. This enables us to obtain a one-step estimator by taking a weighted average of local estimators on each worker. Afterwards, a refined two-step estimation is designed to further reduce the estimation bias. For statistical inference, we utilize a random projection method to reduce the expensive communication cost. Theoretically, we show the consistency and asymptotic normality of both the one-step and two-step estimators. In addition, we provide theoretical guarantee of the distributed statistical inference procedure. The theoretical findings and computational advantages are validated by several numerical simulations implemented on the Spark system. Lastly, an experiment on the Yelp dataset further illustrates the usefulness of the proposed methodology.