论文标题
从聚合的二进制数据中恢复个人级的空间推断
Recovering individual-level spatial inference from aggregated binary data
论文作者
论文摘要
二元回归模型通常用于流行病学和生态学等学科中,以确定空间协变量如何影响个体。在许多研究中,二元数据以空间汇总形式共享以保护隐私。例如,研究人员可以报告说,在地缘政治单位中检测到疾病或未检测到疾病,而不是报告每个经过测试的每个人的位置和结果。通常,空间聚集过程掩盖了每个个体的响应变量,空间协变量和位置的值,这使得恢复的个体级别的推断变得困难。我们表明,将一系列转换(包括支持变化)应用于双变量点过程模型,使研究人员可以从空间汇总的二进制数据中恢复空间协变量的个体级别的推断。一系列转换保留了通常应用于个人级数据所需的二进制回归模型的方便解释。使用仿真实验,我们比较了在不同类型的空间聚合中提出的方法的性能与使用原始个人级别数据的标准方法的性能。我们通过使用汇总以保护处于危险和濒危蝙蝠物种的数据集对个人水平感染的可能性进行建模来说明我们的方法。我们的仿真实验和数据图表证明了当访问原始非聚集数据的访问是不切实际或禁止的。
Binary regression models are commonly used in disciplines such as epidemiology and ecology to determine how spatial covariates influence individuals. In many studies, binary data are shared in a spatially aggregated form to protect privacy. For example, rather than reporting the location and result for each individual that was tested for a disease, researchers may report that a disease was detected or not detected within geopolitical units. Often, the spatial aggregation process obscures the values of response variables, spatial covariates, and locations of each individual, which makes recovering individual-level inference difficult. We show that applying a series of transformations, including a change of support, to a bivariate point process model allows researchers to recover individual-level inference for spatial covariates from spatially aggregated binary data. The series of transformations preserves the convenient interpretation of desirable binary regression models that are commonly applied to individual-level data. Using a simulation experiment, we compare the performance of our proposed method under varying types of spatial aggregation against the performance of standard approaches using the original individual-level data. We illustrate our method by modeling individual-level probability of infection using a data set that has been aggregated to protect an at-risk and endangered species of bats. Our simulation experiment and data illustration demonstrate the utility of the proposed method when access to original non-aggregated data is impractical or prohibited.