在基因表达数据分析中使用本体嵌入结构性诱导偏差

论文标题

在基因表达数据分析中使用本体嵌入结构性诱导偏差

Using ontology embeddings for structural inductive bias in gene expression data analysis

论文作者

Trębacz, Maja, Shams, Zohreh, Jamnik, Mateja, Scherer, Paul, Simidjievski, Nikola, Terre, Helena Andres, Liò, Pietro

论文摘要

根据其基因表达水平对癌症患者进行分层，可以改善诊断，生存分析和治疗计划。但是，此类数据具有极高的尺寸，因为它包含每个患者超过20000个基因的表达值，并且数据集中的样本数量很少。为了处理这种环境，我们建议将有关基因的基因的先前生物学知识纳入机器学习系统，以鉴于其基因表达数据，以进行患者分类的任务。我们使用捕获基因之间的语义相似性的本体嵌入来指导图形卷积网络，从而稀少网络连接。我们显示这种方法为预测高维低样本数据的临床目标提供了优势。

Stratifying cancer patients based on their gene expression levels allows improving diagnosis, survival analysis and treatment planning. However, such data is extremely highly dimensional as it contains expression values for over 20000 genes per patient, and the number of samples in the datasets is low. To deal with such settings, we propose to incorporate prior biological knowledge about genes from ontologies into the machine learning system for the task of patient classification given their gene expression data. We use ontology embeddings that capture the semantic similarities between the genes to direct a Graph Convolutional Network, and therefore sparsify the network connections. We show this approach provides an advantage for predicting clinical targets from high-dimensional low-sample data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题