大规模分子数据上的自我监督的图形变压器

论文标题

大规模分子数据上的自我监督的图形变压器

Self-Supervised Graph Transformer on Large-Scale Molecular Data

论文作者

Rong, Yu, Bian, Yatao, Xu, Tingyang, Xie, Weiyang, Wei, Ying, Huang, Wenbing, Huang, Junzhou

论文摘要

如何获得分子的信息表示是AI驱动的药物设计和发现的关键先决条件。最近的研究将摘要分子作为图形，并采用图形神经网络（GNN）进行分子表示学习。然而，两个问题阻碍了在实际情况下使用GNN的使用：（1）标记的分子不足以进行监督培训；（2）对新合成分子的概括能力不佳。为了解决这两个方面，我们提出了一个新颖的框架格罗弗，该框架代表了自我监督信息传递变压器的图表表示。通过在节点，边缘和图形级别的精心设计的自我监督任务中，Grover可以从巨大的无标记分子数据中学习分子的丰富结构和语义信息。相反，为了编码这样的复杂信息，Grover将消息传递网络集成到变压器式体系结构中，以提供一类更具表现力的分子编码器。 Grover的灵活性允许在大规模分子数据集上有效地对其进行培训，而无需任何监督，因此可以对上述两个问题进行免疫。我们在1000万个未标记的分子上使用1亿个参数预先培训，这是分子表示学习中最大的GNN和最大的培训数据集。然后，我们利用预先训练的Grover进行分子财产预测，然后进行特定于任务的微调，在11个具有挑战性的基准中，我们观察到当前最新方法的巨大改善（平均超过6％）。我们获得的见解是，精心设计的自我实施损失和在很大程度上表现出的预训练模型具有促进性能的巨大潜力。

How to obtain informative representations of molecules is a crucial prerequisite in AI-driven drug design and discovery. Recent researches abstract molecules as graphs and employ Graph Neural Networks (GNNs) for molecular representation learning. Nevertheless, two issues impede the usage of GNNs in real scenarios: (1) insufficient labeled molecules for supervised training; (2) poor generalization capability to new-synthesized molecules. To address them both, we propose a novel framework, GROVER, which stands for Graph Representation frOm self-superVised mEssage passing tRansformer. With carefully designed self-supervised tasks in node-, edge- and graph-level, GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data. Rather, to encode such complex information, GROVER integrates Message Passing Networks into the Transformer-style architecture to deliver a class of more expressive encoders of molecules. The flexibility of GROVER allows it to be trained efficiently on large-scale molecular dataset without requiring any supervision, thus being immunized to the two issues mentioned above. We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning. We then leverage the pre-trained GROVER for molecular property prediction followed by task-specific fine-tuning, where we observe a huge improvement (more than 6% on average) from current state-of-the-art methods on 11 challenging benchmarks. The insights we gained are that well-designed self-supervision losses and largely-expressive pre-trained models enjoy the significant potential on performance boosting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题