评估从非概率样本估算的回归系数中的选择偏差，并应用于遗传学和人口统计学调查

论文标题

评估从非概率样本估算的回归系数中的选择偏差，并应用于遗传学和人口统计学调查

Assessing Selection Bias in Regression Coefficients Estimated from Non-Probability Samples, with Applications to Genetics and Demographic Surveys

论文作者

West, Brady T., Little, Roderick J. A., Andridge, Rebecca R., Boonstra, Philip S., Ware, Erin B., Pandit, Anita, Alvarado-Leiton, Fernanda

论文摘要

选择偏见是基于没有明确定义的概率采样机制的样本的科学利益关系推断科学利益关系的严重潜在问题。受（a）在（a）估计多基因评分（PGS）与志愿者遗传研究中表型的估计关系的潜力的动机，以及（b）在智能手机使用者监测中的亚组平均值的估计差异，我们得出了新颖的选择性测量的选择性偏见，以估计有效的模型和概率模型的估计值，而不是概率的拟合型模型，而不是在线性和概率的模型中，均具有线性的差异。选定的样本和目标人群。这些措施来自正常的模式模型，使分析师能够检查其推论对这些样品中不可毫无疑问选择的假设的敏感性。我们在仿真研究中检查了所提出的措施的有效性，然后使用它们来量化（a）通过Facebook招募的大型志愿者的大型研究中估计的PGS-表型关系的选择偏差，以及（b）在过去一年的雇佣持续时间中估计的亚组估计的亚组差异，在低智能手机使用者的非智能样本中，在过去一年中的平均就业持续时间差异。我们使用大概率样本中的基准估计值评估了这些应用中的措施的性能。

Selection bias is a serious potential problem for inference about relationships of scientific interest based on samples without well-defined probability sampling mechanisms. Motivated by the potential for selection bias in (a) estimated relationships of polygenic scores (PGSs) with phenotypes in genetic studies of volunteers, and (b) estimated differences in subgroup means in surveys of smartphone users, we derive novel measures of selection bias for estimates of the coefficients in linear and probit regression models fitted to non-probability samples, when aggregate-level auxiliary data are available for the selected sample and the target population. The measures arise from normal pattern-mixture models that allow analysts to examine the sensitivity of their inferences to assumptions about non-ignorable selection in these samples. We examine the effectiveness of the proposed measures in a simulation study, and then use them to quantify the selection bias in (a) estimated PGS-phenotype relationships in a large study of volunteers recruited via Facebook, and (b) estimated subgroup differences in mean past-year employment duration in a non-probability sample of low-educated smartphone users. We evaluate the performance of the measures in these applications using benchmark estimates from large probability samples.

下载PDF全文

下载文献需遵守相关版权规定

论文标题