论文标题
140K GDB和锌衍生的词典
Dictionary of 140k GDB and ZINC derived AMONs
论文作者
论文摘要
我们为{\ bf g} db和{\ bf z} inc data-bases提供了所有{\ bf a} mons,使用不超过7个非热原子原子(AGZ7)---基于Amon方法[Huang和Von Lilienfeld,[Huang and von lililienfeld,{\ em em naty Chemiontion}(2020)的有机化学建筑式嵌入式词典( AGZ7记录了组成和宪法异构体的笛卡尔坐标,以及$ \ sim的特性,$ \ sim $ 140k $ 140k的小有机分子通过系统地将锌的所有分子和大多数GDB17分子分解为较小的实体,将所有GDB17的分子分解为较小的实体,并饱满水力,不超过7个沉重的水液(不超过7个沉重的液化体)。 AGZ7 cover the elements \{H, B, C, N, O, F, Si, P, S, Cl, Br, Sn and I\} and includes optimized geometries, total energy and its decomposition, Mulliken atomic charges, dipole moment vectors, quadrupole tensors, electronic spatial extent, eigenvalues of all occupied orbitals, LUMO, gap, isotropic极化性,谐波频率,减小质量,力常数,IR强度,正常坐标,旋转常数,零点能量,内部能量,内部能量,焓,熵,自由能和热容量(在环境条件下)使用B3LYP/CC-PVTZ(在环境条件下)使用SN和I)级别的理论级别。我们用基于AMON的机器学习模型的七个最刚性GDB-17分子的总势能预测的机器学习模型来体现此数据集的有用性。
We present all {\bf A}mons for {\bf G}DB and {\bf Z}inc data-bases using no more than 7 non-hydrogen atoms (AGZ7)---a calculated organic chemistry building-block dictionary based on the AMON approach [Huang and von Lilienfeld, {\em Nature Chemistry} (2020)]. AGZ7 records Cartesian coordinates of compositional and constitutional isomers, as well as properties for $\sim$140k small organic molecules obtained by systematically fragmenting all molecules of Zinc and the majority of GDB17 into smaller entities, saturating with hydrogens, and containing no more than 7 heavy atoms (excluding hydrogen atoms). AGZ7 cover the elements \{H, B, C, N, O, F, Si, P, S, Cl, Br, Sn and I\} and includes optimized geometries, total energy and its decomposition, Mulliken atomic charges, dipole moment vectors, quadrupole tensors, electronic spatial extent, eigenvalues of all occupied orbitals, LUMO, gap, isotropic polarizability, harmonic frequencies, reduced masses, force constants, IR intensity, normal coordinates, rotational constants, zero-point energy, internal energy, enthalpy, entropy, free energy, and heat capacity (all at ambient conditions) using B3LYP/cc-pVTZ (pseudopotentials were used for Sn and I) level of theory. We exemplify the usefulness of this data set with AMON based machine learning models of total potential energy predictions of seven of the most rigid GDB-17 molecules.