论文标题
与相关矢量结果的多个数据源的联合综合分析
Joint integrative analysis of multiple data sources with correlated vector outcomes
论文作者
论文摘要
我们提出了一个分布式二次推理函数框架,以共同估计具有相关矢量结果的多个潜在异质数据源的回归参数。这种联合综合分析的主要目标是通过以统计和计算有效的方式通过边缘回归模型来估计对所有结果的协变。我们为在完全分布和并行化的计算方案中实现的回归参数的统计估计和推断开发了一个数据集成过程。为了克服相关矢量预后的高维可能性而引起的计算和建模挑战,我们建议使用Qu,Lindsay和Li(2000)的二次推理功能,然后通过使用数据源来分析每个数据源,然后使用Quadizate core cormenter ofer ofer corners来分析每个数据源,然后通过使用数据源在每个数据源中使用数据源来汇总'2一般的时刻方法。我们在理论上和数字上都表明所提出的方法可以提高效率,并且计算很快。我们在一项大型多核研究中对吸烟与代谢产物之间关联的联合综合分析说明了所提出的方法,并为易于实施提供了R套餐。
We propose a distributed quadratic inference function framework to jointly estimate regression parameters from multiple potentially heterogeneous data sources with correlated vector outcomes. The primary goal of this joint integrative analysis is to estimate covariate effects on all outcomes through a marginal regression model in a statistically and computationally efficient way. We develop a data integration procedure for statistical estimation and inference of regression parameters that is implemented in a fully distributed and parallelized computational scheme. To overcome computational and modeling challenges arising from the high-dimensional likelihood of the correlated vector outcomes, we propose to analyze each data source using Qu, Lindsay and Li (2000)'s quadratic inference functions, and then to jointly reestimate parameters from each data source by accounting for correlation between data sources using a combined meta-estimator in a similar spirit to Hansen (1982)'s generalised method of moments. We show both theoretically and numerically that the proposed method yields efficiency improvements and is computationally fast. We illustrate the proposed methodology with the joint integrative analysis of the association between smoking and metabolites in a large multi-cohort study and provide an R package for ease of implementation.