论文标题
概率变压器:建模RNA折叠和分子设计的歧义和分布
Probabilistic Transformer: Modelling Ambiguities and Distributions for RNA Folding and Molecule Design
论文作者
论文摘要
我们的世界模棱两可,这反映在我们用来训练算法的数据中。当我们尝试对自然过程进行建模,其中收集的数据受到嘈杂测量和测量技术差异的影响时,这尤其如此。有时,该过程本身是模棱两可的,例如在RNA折叠的情况下,相同的核苷酸序列可以折叠成不同的结构。这表明预测模型应具有相似的概率特征,以匹配IT模型的数据。因此,我们提出了一个分层的潜在分布,以增强最成功的深度学习模型之一,即变压器,以适应歧义性和数据分布。我们在合成任务上显示了方法(1)的好处,该任务捕获了学习隐藏数据分布的能力,(2)具有最先进的RNA折叠结果,从而揭示了对高度模棱两可的数据的优势,并且(3)通过隐式学习基于属性的分子设计,以了解基于属性的分子设计的生成能力。
Our world is ambiguous and this is reflected in the data we use to train our algorithms. This is particularly true when we try to model natural processes where collected data is affected by noisy measurements and differences in measurement techniques. Sometimes, the process itself is ambiguous, such as in the case of RNA folding, where the same nucleotide sequence can fold into different structures. This suggests that a predictive model should have similar probabilistic characteristics to match the data it models. Therefore, we propose a hierarchical latent distribution to enhance one of the most successful deep learning models, the Transformer, to accommodate ambiguities and data distributions. We show the benefits of our approach (1) on a synthetic task that captures the ability to learn a hidden data distribution, (2) with state-of-the-art results in RNA folding that reveal advantages on highly ambiguous data, and (3) demonstrating its generative capabilities on property-based molecule design by implicitly learning the underlying distributions and outperforming existing work.