论文标题
联合全基因组关联研究的联合广义线性混合模型
Federated Generalized Linear Mixed Models for Collaborative Genome-wide Association Studies
论文作者
论文摘要
随着测序成本的下降,有很大的动机进行大规模关联研究以增加检测新变体的功能。不同机构之间的联合关联测试是通过共享由中央服务器汇总的中间测试统计信息来增加样本量的可行解决方案。但是,对执行联合协会测试的挑战存在挑战。已知关联测试会被许多因素(例如人口分层)混淆,种群分层尤其重要,在多层研究和不同地点之间的混合种群中尤其重要。此外,应通过柔性模型考虑疾病病因,以避免遗传效应意义的偏见。进行大规模关联研究的挑战是参与者的隐私以及污名化和边缘化的相关道德问题。在这里,我们提出了DMEGA,这是一种灵活,有效的方法,用于在多个位点之间进行联合广泛的线性混合模型测试,同时未明确共享基本型和表型数据。 DMEGA首先利用参考预测来估计基于人群的协变量,而无需在站点之间共享基因型数据集。接下来,DMEGA使用laplacian近似参数可能性,并将参数估计分解为站点之间有效的本地梯度更新。我们使用模拟和实际数据集来证明DMEGA的准确性和效率。总体而言,DMEGA的公式是灵活的,可以在联合环境中整合固定和随机效果。
As the sequencing costs are decreasing, there is great incentive to perform large scale association studies to increase power of detecting new variants. Federated association testing among different institutions is a viable solution for increasing sample sizes by sharing the intermediate testing statistics that are aggregated by a central server. There are, however, standing challenges to performing federated association testing. Association tests are known to be confounded by numerous factors such as population stratification, which can be especially important in multiancestral studies and in admixed populations among different sites. Furthermore, disease etiology should be considered via flexible models to avoid biases in the significance of the genetic effect. A rising challenge for performing large scale association studies is the privacy of participants and related ethical concerns of stigmatization and marginalization. Here, we present dMEGA, a flexible and efficient method for performing federated generalized linear mixed model based association testing among multiple sites while underlying genotype and phenotype data are not explicitly shared. dMEGA first utilizes a reference projection to estimate population-based covariates without sharing genotype dataset among sites. Next, dMEGA uses Laplacian approximation for the parameter likelihoods and decomposes parameter estimation into efficient local-gradient updates among sites. We use simulated and real datasets to demonstrate the accuracy and efficiency of dMEGA. Overall, dMEGA's formulation is flexible to integrate fixed and random effects in a federated setting.