毫无根据的促使AlphaFold2几乎没有学习精确折叠景观和蛋白质结构预测

论文标题

毫无根据的促使AlphaFold2几乎没有学习精确折叠景观和蛋白质结构预测

Unsupervisedly Prompting AlphaFold2 for Few-Shot Learning of Accurate Folding Landscape and Protein Structure Prediction

论文作者

Zhang, Jun, Liu, Sirui, Chen, Mengyun, Chu, Haotian, Wang, Min, Wang, Zidong, Yu, Jialiang, Ni, Ningxi, Yu, Fan, Chen, Diqing, Yang, Yi Isaac, Xue, Boxin, Yang, Lijiang, Liu, Yuan, Gao, Yi Qin

论文摘要

数据驱动的预测方法可以有效，准确地将蛋白质序列转化为生物活性结构，对于科学研究和医学发展非常有价值。使用共同进化信息确定准确的折叠格局是现代蛋白质结构预测方法的成功基础。作为最新的状态，AlphaFold2显着提高了准确性，而无需进行明确的共同进化分析。然而，其性能仍然显示出对可用序列同源物的强烈依赖。基于对这种依赖原因的审问，我们提出了一种元生成模型Evogen，以弥补较差的MSA靶标的Alphafold2的表现不佳。通过通过校准或虚拟生成的同源序列提示该模型，Evogen在低数据表方面可以准确地帮助AlphaFold2折叠，甚至通过单序预测实现令人鼓舞的性能。能够用很少的MSA做出准确的预测，不仅可以更好地概括为孤儿序列的Alphafold2，而且使其在高通量应用程序中的使用民主化。此外，Evogen与AlphaFold2结合产生了一种概率结构生成方法，该方法可以探索蛋白质序列的替代构象，并且用于序列生成的任务意识可区分算法将使包括蛋白质设计在内的其他相关任务受益。

Data-driven predictive methods which can efficiently and accurately transform protein sequences into biologically active structures are highly valuable for scientific research and medical development. Determining accurate folding landscape using co-evolutionary information is fundamental to the success of modern protein structure prediction methods. As the state of the art, AlphaFold2 has dramatically raised the accuracy without performing explicit co-evolutionary analysis. Nevertheless, its performance still shows strong dependence on available sequence homologs. Based on the interrogation on the cause of such dependence, we presented EvoGen, a meta generative model, to remedy the underperformance of AlphaFold2 for poor MSA targets. By prompting the model with calibrated or virtually generated homologue sequences, EvoGen helps AlphaFold2 fold accurately in low-data regime and even achieve encouraging performance with single-sequence predictions. Being able to make accurate predictions with few-shot MSA not only generalizes AlphaFold2 better for orphan sequences, but also democratizes its use for high-throughput applications. Besides, EvoGen combined with AlphaFold2 yields a probabilistic structure generation method which could explore alternative conformations of protein sequences, and the task-aware differentiable algorithm for sequence generation will benefit other related tasks including protein design.

下载PDF全文

下载文献需遵守相关版权规定

论文标题