论文标题

PLFIT估算器的幂律数据一致性

Consistency of the PLFit estimator for power-law data

论文作者

Bhattacharya, Ayan, Chen, Bohan, van der Hofstad, Remco, Zwart, Bert

论文摘要

我们证明了Clauset等人(2009)提出的幂律拟合PLFIT方法的一致性,以估算来自带有常规尾巴的分布函数的数据中的幂律指数。在复杂的系统社区中,PLFIT已成为估计幂律指数的首选方法。然而,它的数学特性仍然鲜为人知。 PLFIT的困难是它是一个最小距离估计器。它首先选择了一个阈值,该阈值最大程度地减少了大于阈值和帕累托尾的数据点之间的Kolmogorov-Smirnov距离,然后将山估计器应用于此限制数据。由于所使用的订单统计数量是随机的,因此不适用于极端价值理论的幂律指数一致性的一般理论。我们的证明在于首先表明,即使该数字是随机的,对于所使用的订单统计数量,山丘估计器对于一般中间序列是一致的。在这里,我们称序列中间序列生长为无穷大,同时却比样本量小得多。第二个和大多数参与的步骤是证明PLFIT中的优化器具有高概率为中间序列,除非分布的帕累托尾巴高于某个值。对于后一种特殊情况,我们给出一个单独的证明。

We prove the consistency of the Power-Law Fit PLFit method proposed by Clauset et al.(2009) to estimate the power-law exponent in data coming from a distribution function with regularly-varying tail. In the complex systems community, PLFit has emerged as the method of choice to estimate the power-law exponent. Yet, its mathematical properties are still poorly understood. The difficulty in PLFit is that it is a minimum-distance estimator. It first chooses a threshold that minimizes the Kolmogorov-Smirnov distance between the data points larger than the threshold and the Pareto tail, and then applies the Hill estimator to this restricted data. Since the number of order statistics used is random, the general theory of consistency of power-law exponents from extreme value theory does not apply. Our proof consists in first showing that the Hill estimator is consistent for general intermediate sequences for the number of order statistics used, even when that number is random. Here, we call a sequence intermediate when it grows to infinity, while remaining much smaller than the sample size. The second, and most involved, step is to prove that the optimizer in PLFit is with high probability an intermediate sequence, unless the distribution has a Pareto tail above a certain value. For the latter special case, we give a separate proof.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源