论文标题
一种指导网络传播方法来识别结合先前和新信息的疾病基因
A guided network propagation approach to identify disease genes that combines prior and new information
论文作者
论文摘要
生物医学数据科学的主要挑战是确定复杂遗传疾病的因果基因。尽管基因组测序数据大量流入,但鉴定与疾病相关的基因仍然很困难,因为患有相同疾病的个体可能具有很少的遗传变异。蛋白质 - 蛋白质相互作用网络提供了一种解决这种异质性的方法,因为导致相同疾病的基因在网络中倾向于近端。以前,网络传播方法已经从已知疾病基因或新推测与该疾病有关的基因中传播信号(例如,发现在外显病研究中被突变或通过全基因组关联研究链接)。在这里,我们介绍了一个一般框架,该框架考虑了网络上下文中的两个数据源。具体而言,我们使用与疾病相关基因的先验知识来指导从新鉴定为疾病与疾病相关的基因引发的随机步行。在对24种癌症类型的大规模测试中,我们证明,我们整合先前和新信息的方法不仅比单独使用任何一种信息来识别癌症驱动因素基因更好,而且易于胜过其他基于网络的方法。为了证明我们的方法的多功能性,我们还将其应用于全基因组关联数据,以识别与多种复杂疾病功能相关的基因。总体而言,我们的工作表明,使用先验和新数据的引导网络传播方法是识别疾病基因的有力手段。
A major challenge in biomedical data science is to identify the causal genes underlying complex genetic diseases. Despite the massive influx of genome sequencing data, identifying disease-relevant genes remains difficult as individuals with the same disease may share very few, if any, genetic variants. Protein-protein interaction networks provide a means to tackle this heterogeneity, as genes causing the same disease tend to be proximal within networks. Previously, network propagation approaches have spread signal across the network from either known disease genes or genes that are newly putatively implicated in the disease (e.g., found to be mutated in exome studies or linked via genome-wide association studies). Here we introduce a general framework that considers both sources of data within a network context. Specifically, we use prior knowledge of disease-associated genes to guide random walks initiated from genes that are newly identified as perhaps disease-relevant. In large-scale testing across 24 cancer types, we demonstrate that our approach for integrating both prior and new information not only better identifies cancer driver genes than using either source of information alone but also readily outperforms other state-of-the-art network-based approaches. To demonstrate the versatility of our approach, we also apply it to genome-wide association data to identify genes functionally relevant for several complex diseases. Overall, our work suggests that guided network propagation approaches that utilize both prior and new data are a powerful means to identify disease genes.