论文标题

改善基于VAE的复合性能预测的分子表示

Improving VAE based molecular representations for compound property prediction

论文作者

Tevosyan, A., Khondkaryan, L., Khachatrian, H., Tadevosyan, G., Apresyan, L., Babayan, N., Stopper, H., Navoyan, Z.

论文摘要

在化学信息学中收集许多重要任务的标记数据是耗时的,需要昂贵的实验。近年来,机器学习已被用来使用大规模的未标记分子数据集学习分子的丰富表示,并转移知识以解决有限的数据集解决更具挑战性的任务。变性自动编码器是提出的工具之一,以执行化学性质预测和分子生成任务的转移。在这项工作中,我们提出了一种简单的方法,可以通过将有关相关分子描述符的其他信息纳入变异自动编码器所学的表示形式中,以改善机器学习模型的化学性质预测性能。我们在三个属性预测询问上验证该方法。我们探讨了合并描述符数量的影响,描述符和目标属性之间的相关性,数据集的大小等。最后,我们在表示空间中显示了属性预测模型的性能与属性预测数据集的性能与较大的未标记数据集之间的距离之间的关系。

Collecting labeled data for many important tasks in chemoinformatics is time consuming and requires expensive experiments. In recent years, machine learning has been used to learn rich representations of molecules using large scale unlabeled molecular datasets and transfer the knowledge to solve the more challenging tasks with limited datasets. Variational autoencoders are one of the tools that have been proposed to perform the transfer for both chemical property prediction and molecular generation tasks. In this work we propose a simple method to improve chemical property prediction performance of machine learning models by incorporating additional information on correlated molecular descriptors in the representations learned by variational autoencoders. We verify the method on three property prediction asks. We explore the impact of the number of incorporated descriptors, correlation between the descriptors and the target properties, sizes of the datasets etc. Finally, we show the relation between the performance of property prediction models and the distance between property prediction dataset and the larger unlabeled dataset in the representation space.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源