论文标题

几乎没有射击蛋白质产生

Few Shot Protein Generation

论文作者

Ram, Soumya, Bepler, Tristan

论文摘要

我们提出了MSA到蛋白变压器,这是一种蛋白质序列的生成模型,该模型以多个序列比对(MSA)表示的蛋白质家族。与学习蛋白质家族的生成模型的现有方法不同,MSA到蛋白质变压器条件序列直接在多个序列比对的学习编码上生成,从而规定了适合专用家族模型的需求。通过在PFAM中对大量精心策划的多个序列排列进行训练,我们的MSA到蛋白质变压器可以很好地推广到在训练过程中未观察到的蛋白质家族,并且在MSA较小的情况下均超过常规的家庭建模方法。我们的生成方法准确地对上毒和indels进行了建模,并且与其他方法不同,可以进行精确的推断和有效抽样。我们证明了MSA到蛋白质变压器的蛋白质序列建模能力,并将其与综合基准实验中的替代序列建模方法进行了比较。

We present the MSA-to-protein transformer, a generative model of protein sequences conditioned on protein families represented by multiple sequence alignments (MSAs). Unlike existing approaches to learning generative models of protein families, the MSA-to-protein transformer conditions sequence generation directly on a learned encoding of the multiple sequence alignment, circumventing the need for fitting dedicated family models. By training on a large set of well-curated multiple sequence alignments in Pfam, our MSA-to-protein transformer generalizes well to protein families not observed during training and outperforms conventional family modeling approaches, especially when MSAs are small. Our generative approach accurately models epistasis and indels and allows for exact inference and efficient sampling unlike other approaches. We demonstrate the protein sequence modeling capabilities of our MSA-to-protein transformer and compare it with alternative sequence modeling approaches in comprehensive benchmark experiments.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源