论文标题
评估基于图的深度学习模型来预测闪点
Assessing Graph-based Deep Learning Models for Predicting Flash Point
论文作者
论文摘要
有机分子的闪点在防止易燃性危险中起着重要作用,尽管数百万化合物仍然无法衡量,但仍存在大量测量值数据库。为了快速将现有数据扩展到新的化合物,许多研究人员使用了定量结构 - 特质关系(QSPR)分析来有效预测闪点。近年来,基于图的深度学习(GBDL)已成为传统QSPR的强大替代方法。在本文中,首次预测闪光点时实现了GBDL模型。我们通过比较方法评估了两个GBDL模型的性能,即消息通讯神经网络(MPNN)和图形卷积神经网络(GCNN)。我们的结果表明,MPNN的表现均优于GCNN,并且与以前的QSPR研究相比,MPNN的表现略差,但性能可比。 MPNN的平均R2和平均绝对误差(MAE)得分分别比以前的可比研究低2.3%和2.0 K。为了进一步探索GBDL模型,我们收集了迄今为止最大的闪点数据集,其中包含10575个独特的分子。在完整数据集上,优化的MPNN给出了0.803的测试数据R2,MAE为17.8 K。我们还根据分子类型(酸,有机金属,有机凝集,有机硅和有机蛋白)从集成数据集中提取5个数据集,并在这些类别中探索模型的质量。
Flash points of organic molecules play an important role in preventing flammability hazards and large databases of measured values exist, although millions of compounds remain unmeasured. To rapidly extend existing data to new compounds many researchers have used quantitative structure-property relationship (QSPR) analysis to effectively predict flash points. In recent years graph-based deep learning (GBDL) has emerged as a powerful alternative method to traditional QSPR. In this paper, GBDL models were implemented in predicting flash point for the first time. We assessed the performance of two GBDL models, message-passing neural network (MPNN) and graph convolutional neural network (GCNN), by comparing methods. Our result shows that MPNN both outperforms GCNN and yields slightly worse but comparable performance with previous QSPR studies. The average R2 and Mean Absolute Error (MAE) scores of MPNN are, respectively, 2.3% lower and 2.0 K higher than previous comparable studies. To further explore GBDL models, we collected the largest flash point dataset to date, which contains 10575 unique molecules. The optimized MPNN gives a test data R2 of 0.803 and MAE of 17.8 K on the complete dataset. We also extracted 5 datasets from our integrated dataset based on molecular types (acids, organometallics, organogermaniums, organosilicons, and organotins) and explore the quality of the model in these classes.against 12 previous QSPR studies using more traditional