与构象异构体的分子机器学习

论文标题

与构象异构体的分子机器学习

Molecular machine learning with conformer ensembles

论文作者

Axelrod, Simon, Gomez-Bombarelli, Rafael

论文摘要

虚拟筛查可以通过确定有希望的候选人进行实验评估来加速药物发现。机器学习是一种筛选的强大方法，因为它可以从实验数据中学习复杂的结构 - 托管关系，并对虚拟库进行快速预测。分子固有地作为三维合奏存在，它们的生物学作用通常是通过超分子识别发生的。但是，大多数深度学习方法的分子属性预测方法都使用2D图表示作为输入，在某些情况下是单个3D构象。在这里，我们调查了化学传染性群体中多种构象异构体的3D信息如何改善深度学习模型中的分子财产预测。我们介绍了多种深度学习模型，这些模型扩展了ChemProp和Schnet等关键体系结构，并添加了诸如多符号输入和符合符号的元素。然后，我们使用大量的几何分辨分子训练集合在药物活性中对这些模型的性能取舍进行基准取舍。新体系结构的性能明显优于2D型号，但是它们的性能通常与许多构象异构体一样强。我们还发现，4D深度学习模型为每个构象体学习了可解释的注意力。

Virtual screening can accelerate drug discovery by identifying promising candidates for experimental evaluation. Machine learning is a powerful method for screening, as it can learn complex structure-property relationships from experimental data and make rapid predictions over virtual libraries. Molecules inherently exist as a three-dimensional ensemble and their biological action typically occurs through supramolecular recognition. However, most deep learning approaches to molecular property prediction use a 2D graph representation as input, and in some cases a single 3D conformation. Here we investigate how the 3D information of multiple conformers, traditionally known as 4D information in the cheminformatics community, can improve molecular property prediction in deep learning models. We introduce multiple deep learning models that expand upon key architectures such as ChemProp and Schnet, adding elements such as multiple-conformer inputs and conformer attention. We then benchmark the performance trade-offs of these models on 2D, 3D and 4D representations in the prediction of drug activity using a large training set of geometrically resolved molecules. The new architectures perform significantly better than 2D models, but their performance is often just as strong with a single conformer as with many. We also find that 4D deep learning models learn interpretable attention weights for each conformer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题