论文标题
AI驱动的有机化学的超图网络:反应分类的网络统计和应用
AI-driven Hypergraph Network of Organic Chemistry: Network Statistics and Applications in Reaction Classification
论文作者
论文摘要
近年来,高吞吐量筛选,对化学设计更复杂的化学设计空间的可访问性以及准确的分子建模框架的发展,近年来快速发现了新的反应和分子。因此,需要对不断增长的化学文献进行的整体研究,它的重点是理解最近的趋势并将其推断到可能的未来轨迹中。为此,已经报道了几项基于网络理论的研究,该研究使用了化学反应的定向图表示。在这里,我们对代表化学反应的研究进行了一项研究,其中超蛋白质代表化学反应,节点代表参与分子。我们使用标准反应数据集来构建超网络,并报告其统计数据,例如学位分布,平均路径长度,分类性或程度相关性,Pagerank Centrality和基于图的集群(或社区)。我们还计算了每个统计量的反应的等效的有向图表示,以绘制相似之处并突出两者之间的差异。为了证明超图反应表示的AI适用性,我们生成致密的超图嵌入,并将其用于反应分类问题。我们得出的结论是,超网络表示是灵活的,可以保留反应环境,并发现了隐藏的见解,这些见解在传统的化学反应的传统有向图表示中并不明显。
Rapid discovery of new reactions and molecules in recent years has been facilitated by the advancements in high throughput screening, accessibility to a much more complex chemical design space, and the development of accurate molecular modeling frameworks. A holistic study of the growing chemistry literature is, therefore, required that focuses on understanding the recent trends and extrapolating them into possible future trajectories. To this end, several network theory-based studies have been reported that use a directed graph representation of chemical reactions. Here, we perform a study based on representing chemical reactions as hypergraphs where the hyperedges represent chemical reactions and nodes represent the participating molecules. We use a standard reactions dataset to construct a hypernetwork and report its statistics such as degree distributions, average path length, assortativity or degree correlations, PageRank centrality, and graph-based clusters (or communities). We also compute each statistic for an equivalent directed graph representation of reactions to draw parallels and highlight differences between the two. To demonstrate the AI applicability of hypergraph reaction representation, we generate dense hypergraph embeddings and use them in the reaction classification problem. We conclude that the hypernetwork representation is flexible, preserves reaction context, and uncovers hidden insights that are otherwise not apparent in a traditional directed graph representation of chemical reactions.