论文标题

使用广义估计方程的高维纵向蛋白质组学数据的线性功能的推断

Inference for linear functionals of high-dimensional longitudinal proteomics data using generalized estimating equations

论文作者

Xia, Lu, Shojaie, Ali

论文摘要

相关数据的回归分析在许多科学领域无处不在,其中相同的响应记录在同一单元上。随着新技术的出现,特别是高通量的OMICS分析测定,与可用样本量相比,这种相关数据越来越多地由大量变量组成。由COVID-19的最新纵向蛋白质组学研究激发,我们提出了一种新型的推理程序,用于在广义估计方程中高维回归系数的线性功能,该方程广泛用于分析相关数据。我们对通过构造投影估计方程获得的更一般推论目标的估计量被证明是在轻度的规律性条件下正态分布的。我们还引入了一个数据驱动的交叉验证过程,以选择用于估算投影方向的调谐参数,而现有过程中未解决。我们说明了提出的程序在提供基于高维蛋白质组学数据获得的单个蛋白质和严重的共同风险评分的关联的置信区间中的实用性,并通过广泛的模拟证明了其稳健的有限样本性能,尤其是在估计偏差和置信区间覆盖中。

Regression analysis of correlated data, where multiple correlated responses are recorded on the same unit, is ubiquitous in many scientific areas. With the advent of new technologies, in particular high-throughput omics profiling assays, such correlated data increasingly consist of large number of variables compared with the available sample size. Motivated by recent longitudinal proteomics studies of COVID-19, we propose a novel inference procedure for linear functionals of high-dimensional regression coefficients in generalized estimating equations, which are widely used to analyze correlated data. Our estimator for this more general inferential target, obtained via constructing projected estimating equations, is shown to be asymptotically normally distributed under mild regularity conditions. We also introduce a data-driven cross-validation procedure to select the tuning parameter for estimating the projection direction, which is not addressed in the existing procedures. We illustrate the utility of the proposed procedure in providing confidence intervals for associations of individual proteins and severe COVID risk scores obtained based on high-dimensional proteomics data, and demonstrate its robust finite-sample performance, especially in estimation bias and confidence interval coverage, via extensive simulations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源