论文标题
基于检索的可控分子产生
Retrieval-based Controllable Molecule Generation
论文作者
论文摘要
通过生成模型生成具有特定化学和生物学特性的新分子已成为药物发现的有前途的方向。但是,现有的方法需要大型数据集进行广泛的培训/微调,这在现实世界中通常不可用。在这项工作中,我们提出了一个新的基于检索的框架,用于可控分子生成。我们使用一小部分示例分子,即(部分)满足设计标准的示例分子,以引导预先训练的生成模型来合成满足给定设计标准的分子。我们设计了一种检索机制,该机制将示例分子与输入分子融合在一起,该分子是由一个新的自我监管的目标训练的,该目标可以预测输入分子的最近邻居。我们还提出了一个迭代改进过程,以动态更新生成的分子和检索数据库,以更好地概括。我们的方法不可替代生成模型,不需要特定于任务的微调。从简单设计标准到设计与SARS-COV-2主要蛋白酶结合的铅化合物的具有挑战性的现实世界情景,各种任务,我们证明了我们的方法超出了检索数据库,并且比以前的方法更好地实现了性能和更广泛的适用性。代码可从https://github.com/nvlabs/retmol获得。
Generating new molecules with specified chemical and biological properties via generative models has emerged as a promising direction for drug discovery. However, existing methods require extensive training/fine-tuning with a large dataset, often unavailable in real-world generation tasks. In this work, we propose a new retrieval-based framework for controllable molecule generation. We use a small set of exemplar molecules, i.e., those that (partially) satisfy the design criteria, to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria. We design a retrieval mechanism that retrieves and fuses the exemplar molecules with the input molecule, which is trained by a new self-supervised objective that predicts the nearest neighbor of the input molecule. We also propose an iterative refinement process to dynamically update the generated molecules and retrieval database for better generalization. Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning. On various tasks ranging from simple design criteria to a challenging real-world scenario for designing lead compounds that bind to the SARS-CoV-2 main protease, we demonstrate our approach extrapolates well beyond the retrieval database, and achieves better performance and wider applicability than previous methods. Code is available at https://github.com/NVlabs/RetMol.