论文标题
PLFIT估算器的幂律数据一致性
Consistency of the PLFit estimator for power-law data
论文作者
论文摘要
我们证明了Clauset等人(2009)提出的幂律拟合PLFIT方法的一致性,以估算来自带有常规尾巴的分布函数的数据中的幂律指数。在复杂的系统社区中,PLFIT已成为估计幂律指数的首选方法。然而,它的数学特性仍然鲜为人知。 PLFIT的困难是它是一个最小距离估计器。它首先选择了一个阈值,该阈值最大程度地减少了大于阈值和帕累托尾的数据点之间的Kolmogorov-Smirnov距离,然后将山估计器应用于此限制数据。由于所使用的订单统计数量是随机的,因此不适用于极端价值理论的幂律指数一致性的一般理论。我们的证明在于首先表明,即使该数字是随机的,对于所使用的订单统计数量,山丘估计器对于一般中间序列是一致的。在这里,我们称序列中间序列生长为无穷大,同时却比样本量小得多。第二个和大多数参与的步骤是证明PLFIT中的优化器具有高概率为中间序列,除非分布的帕累托尾巴高于某个值。对于后一种特殊情况,我们给出一个单独的证明。
We prove the consistency of the Power-Law Fit PLFit method proposed by Clauset et al.(2009) to estimate the power-law exponent in data coming from a distribution function with regularly-varying tail. In the complex systems community, PLFit has emerged as the method of choice to estimate the power-law exponent. Yet, its mathematical properties are still poorly understood. The difficulty in PLFit is that it is a minimum-distance estimator. It first chooses a threshold that minimizes the Kolmogorov-Smirnov distance between the data points larger than the threshold and the Pareto tail, and then applies the Hill estimator to this restricted data. Since the number of order statistics used is random, the general theory of consistency of power-law exponents from extreme value theory does not apply. Our proof consists in first showing that the Hill estimator is consistent for general intermediate sequences for the number of order statistics used, even when that number is random. Here, we call a sequence intermediate when it grows to infinity, while remaining much smaller than the sample size. The second, and most involved, step is to prove that the optimizer in PLFit is with high probability an intermediate sequence, unless the distribution has a Pareto tail above a certain value. For the latter special case, we give a separate proof.