论文标题

在基于块和参考面板的估计器上,用于高维度的遗传数据预测

On block-wise and reference panel-based estimators for genetic data prediction in high dimensions

论文作者

Zhao, Bingxin, Zheng, Shurong, Zhu, Hongtu

论文摘要

复杂性状和疾病的遗传预测引起了精确医学的极大关注,主要是因为它有可能将全基因组关联研究(GWAS)的发现转化为医学进步。由于遗传变异的高维协方差矩阵(或链接不平衡(LD)模式)具有块对基结构,因此许多现有的方法试图考虑预定的局部LD块/区域中变体之间的依赖性。此外,由于隐私限制和数据保护问题,每个LD块中的遗传变异依赖性通常是从外部参考面板而不是原始培训数据集估算的。本文在没有稀疏性限制的高维预测框架中介绍了基于块和参考面板的统一分析。我们发现,令人惊讶的是,即使协方差矩阵具有具有明确定义边界的块对基结构,调整局部依赖性的块估计方法可能比控制整个协方差矩阵的方法的准确性要高得多。此外,在原始培训数据集和外部参考面板上构建的估计方法可能在高维度上具有不同的性能,这可能反映出仅访问培训数据集的摘要级别数据的成本。该分析基于我们的新型结果,用于块 - 二基因协方差矩阵的随机矩阵理论。我们使用广泛的模拟和对36个复杂性状的大规模英国生物银行真实数据分析来数字评估我们的结果。

Genetic prediction of complex traits and diseases has attracted enormous attention in precision medicine, mainly because it has the potential to translate discoveries from genome-wide association studies (GWAS) into medical advances. As the high dimensional covariance matrix (or the linkage disequilibrium (LD) pattern) of genetic variants has a block-diagonal structure, many existing methods attempt to account for the dependence among variants in predetermined local LD blocks/regions. Moreover, due to privacy restrictions and data protection concerns, genetic variant dependence in each LD block is typically estimated from external reference panels rather than the original training dataset. This paper presents a unified analysis of block-wise and reference panel-based estimators in a high-dimensional prediction framework without sparsity restrictions. We find that, surprisingly, even when the covariance matrix has a block-diagonal structure with well-defined boundaries, block-wise estimation methods adjusting for local dependence can be substantially less accurate than methods controlling for the whole covariance matrix. Further, estimation methods built on the original training dataset and external reference panels are likely to have varying performance in high dimensions, which may reflect the cost of having only access to summary level data from the training dataset. This analysis is based on our novel results in random matrix theory for block-diagonal covariance matrix. We numerically evaluate our results using extensive simulations and the large-scale UK Biobank real data analysis of 36 complex traits.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源