生物医学信息提取疾病基因优先考虑

论文标题

生物医学信息提取疾病基因优先考虑

Biomedical Information Extraction for Disease Gene Prioritization

论文作者

Parmar, Jupinder, Koehler, William, Bringmann, Martin, Volz, Katharina Sophia, Kapicioglu, Berk

论文摘要

我们介绍了一种生物医学信息提取（IE）管道，该管道从文本中提取生物学关系，并证明其组件，例如命名实体识别（NER）和关系提取（RE），比奥尔普（Bionlp）表现出色。我们将其应用于数千万的PubMed摘要中，以提取蛋白质 - 蛋白质相互作用（PPI），并将这些提取物扩展到生物医学知识图中，该图已经包含从String（领先的结构化PPI数据库）中提取的PPI。我们表明，尽管已经包含来自已建立的结构化来源的PPI，但增强了对图表的基于IE的提取，使我们能够预测新型疾病 - 基因的关联，而HIT@30相对增加了20％，这是朝着开发未经疾病的药物目标的重要一步。

We introduce a biomedical information extraction (IE) pipeline that extracts biological relationships from text and demonstrate that its components, such as named entity recognition (NER) and relation extraction (RE), outperform state-of-the-art in BioNLP. We apply it to tens of millions of PubMed abstracts to extract protein-protein interactions (PPIs) and augment these extractions to a biomedical knowledge graph that already contains PPIs extracted from STRING, the leading structured PPI database. We show that, despite already containing PPIs from an established structured source, augmenting our own IE-based extractions to the graph allows us to predict novel disease-gene associations with a 20% relative increase in hit@30, an important step towards developing drug targets for uncured diseases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题