LowCon：一种基于设计的次采样方法

论文标题

LowCon：一种基于设计的次采样方法

LowCon: A design-based subsampling approach in a misspecified linear modeL

论文作者

Meng, Cheng, Xie, Rui, Mandal, Abhyuday, Zhang, Xinlian, Zhong, Wenxuan, Ma, Ping

论文摘要

我们考虑了一个受限制的监督学习问题，即（1）给出预测因素的完整样本；（2）响应观察结果不可用，测量昂贵。因此，选择一个预测变量观测值，测量相应响应，然后在预测因子和响应的子样本上拟合监督的学习模型是理想的选择。但是，模型拟合是一个试验和错误过程，并且可以误指定数据的假定模型。我们的经验研究表明，当模型被弄清楚时，大多数现有的子采样方法的性能都不令人满意。在本文中，我们开发了一种称为“ LowCon”的新型亚采样方法，该方法在误指定工作线性模型时优于竞争方法。我们的方法使用正交拉丁超立方体设计来实现强大的估计。我们表明，基于设计的估计量大约将所谓的“最坏情况”偏差最小化，相对于许多可能的错误指定术语。模拟和真实数据分析都表明，所提出的估计量比通过最新的子采样方法获得的几个子样本最小二乘估计器更强大。

We consider a measurement constrained supervised learning problem, that is, (1) full sample of the predictors are given; (2) the response observations are unavailable and expensive to measure. Thus, it is ideal to select a subsample of predictor observations, measure the corresponding responses, and then fit the supervised learning model on the subsample of the predictors and responses. However, model fitting is a trial and error process, and a postulated model for the data could be misspecified. Our empirical studies demonstrate that most of the existing subsampling methods have unsatisfactory performances when the models are misspecified. In this paper, we develop a novel subsampling method, called "LowCon", which outperforms the competing methods when the working linear model is misspecified. Our method uses orthogonal Latin hypercube designs to achieve a robust estimation. We show that the proposed design-based estimator approximately minimizes the so-called "worst-case" bias with respect to many possible misspecification terms. Both the simulated and real-data analyses demonstrate the proposed estimator is more robust than several subsample least squares estimators obtained by state-of-the-art subsampling methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题