论文标题
对非概率样本的强大而有效的贝叶斯推断
Robust and Efficient Bayesian Inference for Non-Probability Samples
论文作者
论文摘要
概率调查的响应率下降,以及非结构化数据的广泛可用性,导致对非概况样本的研究不断增长。现有的强大方法并未达到非高斯结果,并且在存在有影响力的伪重量的情况下表现较差。此外,他们的方差估计量缺乏统一的框架,并且经常依靠渐近理论。为了解决这些差距,我们使用部分线性高斯过程回归提出了一种替代性贝叶斯方法,该方法利用了具有伪融合概率灵活函数的预测模型,以将结果变量归为参考调查。根据效率,我们不仅是指计算可扩展性,而且是关于方差的优势。我们还表明,高斯过程回归基于估计的倾向得分作为内核匹配技术,从而产生双重鲁棒性并降低对影响伪重量的敏感性。使用模拟后验预测分布,可以直接量化所提出的估计器的不确定性,并得出相关的$ 95 \%$可信间隔。我们在两项模拟研究中评估了方法的重复采样属性。这项研究的应用涉及建模数据,并在非概率样本设置下进行了不同的暴露。
The declining response rates in probability surveys along with the widespread availability of unstructured data has led to growing research into non-probability samples. Existing robust approaches are not well-developed for non-Gaussian outcomes and may perform poorly in presence of influential pseudo-weights. Furthermore, their variance estimator lacks a unified framework and rely often on asymptotic theory. To address these gaps, we propose an alternative Bayesian approach using a partially linear Gaussian process regression that utilizes a prediction model with a flexible function of the pseudo-inclusion probabilities to impute the outcome variable for the reference survey. By efficiency, we mean not only computational scalability but also superiority with respect to variance. We also show that Gaussian process regression behaves as a kernel matching technique based on the estimated propensity scores, which yields double robustness and lowers sensitivity to influential pseudo-weights. Using the simulated posterior predictive distribution, one can directly quantify the uncertainty of the proposed estimator and derive associated $95\%$ credible intervals. We assess the repeated sampling properties of our method in two simulation studies. The application of this study deals with modeling count data with varying exposures under a non-probability sample setting.