论文标题

与流数据有关的高维单索引模型的推断

Inference on High-dimensional Single-index Models with Streaming Data

论文作者

Han, Dongxiao, Xie, Jinhan, Liu, Jin, Sun, Liuquan, Huang, Jian, Jian, Bei, Kong, Linglong

论文摘要

由于流数据,传统的统计方法面临着新的挑战。主要的挑战是数据的迅速增长和速度,这使得在内存中存储如此大的数据集。本文为具有未知链接功能的高维半摩擦学单索引模型提供了一个在线推理框架,用于回归参数。提出的在线过程仅更新当前数据批次和历史数据的摘要统计信息,而不是重新访问整个原始数据集。同时,我们不需要估计未知的链接功能,这是一项高度挑战的任务。另外,在提出的推理程序中使用了广义凸损耗函数。为了说明所提出的方法,我们使用Huber损失函数和逻辑回归模型的负模样。在这项研究中,研究了所提出的在线脱叠套索估计器的渐近正态性以及所提出的在线拉索估计量的界限。为了评估所提出方法的性能,已经进行了广泛的仿真研究。我们为纳斯达克股价和财务困境数据集提供申请。

Traditional statistical methods are faced with new challenges due to streaming data. The major challenge is the rapidly growing volume and velocity of data, which makes storing such huge datasets in memory impossible. The paper presents an online inference framework for regression parameters in high-dimensional semiparametric single-index models with unknown link functions. The proposed online procedure updates only the current data batch and summary statistics of historical data instead of re-accessing the entire raw data set. At the same time, we do not need to estimate the unknown link function, which is a highly challenging task. In addition, a generalized convex loss function is used in the proposed inference procedure. To illustrate the proposed method, we use the Huber loss function and the logistic regression model's negative log-likelihood. In this study, the asymptotic normality of the proposed online debiased Lasso estimators and the bounds of the proposed online Lasso estimators are investigated. To evaluate the performance of the proposed method, extensive simulation studies have been conducted. We provide applications to Nasdaq stock prices and financial distress datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源