论文标题
高维逻辑模型中MLE的渐近分布:任意协方差
The Asymptotic Distribution of the MLE in High-dimensional Logistic Models: Arbitrary Covariance
论文作者
论文摘要
我们研究了高维Logistic模型中最大似然估计值(MLE)的分布,将SUR(2019)的最新结果扩展到高斯协变量可能具有任意协方差结构的情况。我们证明,在数量$ p $之间的大量问题和样本量$ n $常数之间的比率中,每个有限的MLE坐标列表都遵循多元正态分布。具体而言,mle的$ j $ th坐标$ \ hatβ_j$是渐近地正态分布,平均$α__\ starβ_j$和标准偏差$σ_\ star/τ_j$;在这里,$β_j$是真实回归系数的值,而$τ_j$ $ j $ th预测器的标准偏差在所有其他方面有条件。数值参数$α_\ star> 1 $和$σ_\ star $仅取决于问题维度$ p/n $和整体信号强度,并且可以准确估算。我们的结果表明,MLE的大小是向上偏向的,并且MLE的标准偏差大于经典理论所预测的偏差。我们在模拟和真实数据上介绍了一系列实验,显示了与该理论的良好一致性。
We study the distribution of the maximum likelihood estimate (MLE) in high-dimensional logistic models, extending the recent results from Sur (2019) to the case where the Gaussian covariates may have an arbitrary covariance structure. We prove that in the limit of large problems holding the ratio between the number $p$ of covariates and the sample size $n$ constant, every finite list of MLE coordinates follows a multivariate normal distribution. Concretely, the $j$th coordinate $\hat β_j$ of the MLE is asymptotically normally distributed with mean $α_\star β_j$ and standard deviation $σ_\star/τ_j$; here, $β_j$ is the value of the true regression coefficient, and $τ_j$ the standard deviation of the $j$th predictor conditional on all the others. The numerical parameters $α_\star > 1$ and $σ_\star$ only depend upon the problem dimensionality $p/n$ and the overall signal strength, and can be accurately estimated. Our results imply that the MLE's magnitude is biased upwards and that the MLE's standard deviation is greater than that predicted by classical theory. We present a series of experiments on simulated and real data showing excellent agreement with the theory.