从大规模的未标记分子中学习分子表示，以发现药物

论文标题

从大规模的未标记分子中学习分子表示，以发现药物

Learn molecular representations from large-scale unlabeled molecules for drug discovery

论文作者

Li, Pengyong, Wang, Jun, Qiao, Yixuan, Chen, Hao, Yu, Yihuan, Yao, Xiaojun, Gao, Peng, Xie, Guotong, Song, Sen

论文摘要

如何产生表达性分子表示是AI驱动的药物发现中的基本挑战。图神经网络（GNN）已成为建模分子数据的强大技术。但是，以前的监督方法通常会遭受标记数据的稀缺性，并且概括能力差。在这里，我们提出了一个新型的分子预训练图基于图的深度学习框架，该框架名为MPG，该框架依靠大型未标记分子的分子表示。在MPG中，我们提出了一个强大的Molgnet模型，并提出了一种有效的自我监督策略，用于预先训练该模型在节点和图形级别。在对1100万个未标记分子进行预训练之后，我们透露Molgnet可以捕获有价值的化学见解以产生可解释的代表。预先训练的molgnet只需一个额外的输出层即可进行微调，以创建针对广泛的药物发现任务的最先进模型，包括分子特性预测，药物 - 药物相互作用和药物目标相互作用，涉及13个基准数据集。我们的工作表明，MPG有望成为药物发现管道中的一种新方法。

How to produce expressive molecular representations is a fundamental challenge in AI-driven drug discovery. Graph neural network (GNN) has emerged as a powerful technique for modeling molecular data. However, previous supervised approaches usually suffer from the scarcity of labeled data and have poor generalization capability. Here, we proposed a novel Molecular Pre-training Graph-based deep learning framework, named MPG, that leans molecular representations from large-scale unlabeled molecules. In MPG, we proposed a powerful MolGNet model and an effective self-supervised strategy for pre-training the model at both the node and graph-level. After pre-training on 11 million unlabeled molecules, we revealed that MolGNet can capture valuable chemistry insights to produce interpretable representation. The pre-trained MolGNet can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of drug discovery tasks, including molecular properties prediction, drug-drug interaction, and drug-target interaction, involving 13 benchmark datasets. Our work demonstrates that MPG is promising to become a novel approach in the drug discovery pipeline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题