论文标题

生物医学知识图形的完善和完成图表示学习和TOP-K相似度度量

Biomedical Knowledge Graph Refinement and Completion using Graph Representation Learning and Top-K Similarity Measure

论文作者

Ebeid, Islam Akef, Hassan, Majdi, Wanyan, Tingyi, Roper, Jack, Seal, Abhik, Ding, Ying

论文摘要

知识图一直是整合异质数据源的基本方法之一。整合异质数据源至关重要,尤其是在生物医学领域,在生物医学领域中,中央数据驱动的任务(例如药物发现)依赖于合并来自不同生物医学数据库的信息。这些数据库包含各种生物实体和关系,例如蛋白质(PDB),基因(基因本体学),药物(药物库),疾病(DDB)和蛋白质 - 蛋白质相互作用(Biogrid)。语义整合异质生物医学数据库的过程通常会带有缺陷。数据驱动的药物发现的质量取决于所使用的采矿方法的准确性以及数据的质量。因此,拥有完整而精致的生物医学知识图对于实现更准确的药物发现结果至关重要。在这里,我们建议使用最新的图表表示学习和嵌入模型来完善和完整的生物医学知识图。这项初步工作证明了综合生物医学知识图Chem2BIO2RD的学习离散表示[3]。我们使用简单的TOP-K余弦相似性度量在学习的嵌入向量之间执行知识图完成和完善任务,以预测数据中存在的药物和目标之间的缺失联系。我们表明,此简单过程可用于链接预测中的二进制分类器。

Knowledge Graphs have been one of the fundamental methods for integrating heterogeneous data sources. Integrating heterogeneous data sources is crucial, especially in the biomedical domain, where central data-driven tasks such as drug discovery rely on incorporating information from different biomedical databases. These databases contain various biological entities and relations such as proteins (PDB), genes (Gene Ontology), drugs (DrugBank), diseases (DDB), and protein-protein interactions (BioGRID). The process of semantically integrating heterogeneous biomedical databases is often riddled with imperfections. The quality of data-driven drug discovery relies on the accuracy of the mining methods used and the data's quality as well. Thus, having complete and refined biomedical knowledge graphs is central to achieving more accurate drug discovery outcomes. Here we propose using the latest graph representation learning and embedding models to refine and complete biomedical knowledge graphs. This preliminary work demonstrates learning discrete representations of the integrated biomedical knowledge graph Chem2Bio2RD [3]. We perform a knowledge graph completion and refinement task using a simple top-K cosine similarity measure between the learned embedding vectors to predict missing links between drugs and targets present in the data. We show that this simple procedure can be used alternatively to binary classifiers in link prediction.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源