论文标题
G4(MP2)模型的临界基准测试,相关一致的复合方法和在概率修剪的基准数据集上的流行密度近似值的形成焓数据集
Critical Benchmarking of the G4(MP2) Model, the Correlation Consistent Composite Approach and Popular Density Functional Approximations on a Probabilistically Pruned Benchmark Dataset of Formation Enthalpies
论文作者
论文摘要
标准形成焓的第一原理计算,$ΔH_F^\ Circ $(298K),如化学空间探索所要求的大规模计算,仅具有密度函数近似(DFAS)和某些复合波函数理论(CWFTS)。 las,流行范围分离的混合动力车,“ rung-4” DFA和CWFTS提供了最佳准确性-vs.-cost权衡取舍的准确性,但仅针对数据集建立了主要包含小分子的数据集,因此,它们的传递性仍然模糊。在这项研究中,我们提出了一个超过1600个值的扩展基准数据集,该数据集为$ΔH_F^\ circ $,用于结构和电子各种分子。我们基于边界校正的核密度估计应用四分位数级,以滤除离群值并到达1694种化合物的概率修剪焓(PPE1694)。对于此数据集,我们使用常规和概率误差指标对G4,G4(MP2),CCCA,CBS-QB3和23个流行DFA的预测准确度进行排名。我们讨论了系统的预测误差,并突出了G4(MP2)模型中经验高级校正(HLC)的作用。此外,我们评论与原子的参考经验数据相关的不确定性以及随着分子大小而生长的系统误差。我们认为这些发现有助于确定量子热化学方法的有意义的应用领域。
First-principles calculation of the standard formation enthalpy, $ΔH_f^\circ$ (298K), in such large scale as required by chemical space explorations, is amenable only with density functional approximations (DFAs) and some composite wave function theories (cWFTs). Alas, the accuracies of popular range-separated hybrid, `rung-4' DFAs, and cWFTs that offer the best accuracy-vs.-cost trade-off have as yet been established only for datasets predominantly comprising small molecules, hence, their transferability to larger datasets remains vague. In this study, we present an extended benchmark dataset of over 1600 values of $ΔH_f^\circ$ for structurally and electronically diverse molecules. We apply quartile-ranking based on boundary-corrected kernel density estimation to filter outliers and arrive at Probabilistically Pruned Enthalpies of 1694 compounds (PPE1694). For this dataset, we rank the prediction accuracies of G4, G4(MP2), ccCA, CBS-QB3 and 23 popular DFAs using conventional and probabilistic error metrics. We discuss systematic prediction errors and highlight the role an empirical higher-level correction (HLC) plays in the G4(MP2) model. Furthermore, we comment on uncertainties associated with the reference empirical data for atoms and the systematic errors stemming from these that grow with the molecular size. We believe these findings to aid in identifying meaningful application domains for quantum thermochemical methods.