论文标题
早期虚拟筛选的分子特性的概率分布的低成本预测
Low cost prediction of probability distributions of molecular properties for early virtual screening
论文作者
论文摘要
尽管一般关注对值的预测,但在数学上更合适的是概率分布的预测:诸如预测不确定性,较高的矩和分位数之类的其他可能性。出于计算机辅助药物设计领域的目的,本文采用了层次相关重建方法,以前应用于人口,财务和天文数据的分析。它不是单个线性回归来预测值,而是使用多个线性回归来独立预测多个矩,最终将它们结合到预测的概率分布中,这里基于Klekota \&Roth开发的几个ADMET属性。讨论的应用示例是在虚拟筛选过程中廉价地选择具有属性几乎确定在预测或选择范围内的分子的廉价选择。这种方法可以促进结果的解释,因为自动检测到以高不确定性为特征的预测。此外,对于每个研究的预测问题,我们都检测到了关键的结构特征,在优化针对特定特性的化合物时,应仔细考虑这些特征。因此,研究中开发的整个方法构成了对药物学家的大力支持,因为它可以快速排斥所需的物理化学/ADMET特征最低的化合物,并指导化合物优化过程。
While there is a general focus on predictions of values, mathematically more appropriate is prediction of probability distributions: with additional possibilities like prediction of uncertainty, higher moments and quantiles. For the purpose of the computer-aided drug design field, this article applies Hierarchical Correlation Reconstruction approach, previously applied in the analysis of demographic, financial and astronomical data. Instead of a single linear regression to predict values, it uses multiple linear regressions to independently predict multiple moments, finally combining them into predicted probability distribution, here of several ADMET properties based on substructural fingerprint developed by Klekota\&Roth. Discussed application example is inexpensive selection of a percentage of molecules with properties nearly certain to be in a predicted or chosen range during virtual screening. Such an approach can facilitate the interpretation of the results as the predictions characterized by high rate of uncertainty are automatically detected. In addition, for each of the investigated predictive problems, we detected crucial structural features, which should be carefully considered when optimizing compounds towards particular property. The whole methodology developed in the study constitutes therefore a great support for medicinal chemists, as it enable fast rejection of compounds with the lowest potential of desired physicochemical/ADMET characteristic and guides the compound optimization process.