什么切割？蛋白酶体的切割预测是否达到天花板？

论文标题

什么切割？蛋白酶体的切割预测是否达到天花板？

What cleaves? Is proteasomal cleavage prediction reaching a ceiling?

论文作者

Ziegler, Ingo, Ma, Bolei, Nie, Ercong, Bischl, Bernd, Rügamer, David, Schubert, Benjamin, Dorigatti, Emilio

论文摘要

表位疫苗是实现癌症，自身免疫性疾病和过敏的精确治疗的有希望的方向。有效地设计这种疫苗需要准确预测蛋白酶体裂解，以确保通过主要的组织相容性复合物（MHC）将疫苗的表位呈现给T细胞。虽然直接鉴定蛋白酶体裂解\ emph {inter}是繁琐且低吞吐量，但有可能隐式从MHC呈现的表演末端中隐式推断裂解事件，这可以大大检测到高通量MHC Ligandomomens的最新进展。以这种方式推断裂解事件提供了一种天生的嘈杂信号，可以通过深度学习领域的新发展来解决，据说可以从嘈杂的标签中学习预测变量。受这些创新的启发，我们试图通过在最近引入的乳沟数据集上基准测试包括LSTMS，Transformers，CNN和DeNoising方法在内的广泛的近期方法来现代化蛋白酶体的裂解预测因子。我们发现，增加的模型量表和复杂性似乎可以带来有限的性能提高，因为几种方法在C末端达到了约88.5％的AUC，而N末端裂解预测的AUC则达到了79.5％的AUC。这表明蛋白酶体裂解的噪声和/或复杂性以及随后的抗原加工途径的生物学过程是预测性能的主要限制因素，而不是所使用的特定建模方法。虽然可以通过更多的数据和更好的模型来解决生物复杂性，但噪声和随机性固有地限制了最大可实现的预测性能。我们所有的数据集和实验均可在https://github.com/ziegler-ingo/cleavage_prediction上找到。

Epitope vaccines are a promising direction to enable precision treatment for cancer, autoimmune diseases, and allergies. Effectively designing such vaccines requires accurate prediction of proteasomal cleavage in order to ensure that the epitopes in the vaccine are presented to T cells by the major histocompatibility complex (MHC). While direct identification of proteasomal cleavage \emph{in vitro} is cumbersome and low throughput, it is possible to implicitly infer cleavage events from the termini of MHC-presented epitopes, which can be detected in large amounts thanks to recent advances in high-throughput MHC ligandomics. Inferring cleavage events in such a way provides an inherently noisy signal which can be tackled with new developments in the field of deep learning that supposedly make it possible to learn predictors from noisy labels. Inspired by such innovations, we sought to modernize proteasomal cleavage predictors by benchmarking a wide range of recent methods, including LSTMs, transformers, CNNs, and denoising methods, on a recently introduced cleavage dataset. We found that increasing model scale and complexity appeared to deliver limited performance gains, as several methods reached about 88.5% AUC on C-terminal and 79.5% AUC on N-terminal cleavage prediction. This suggests that the noise and/or complexity of proteasomal cleavage and the subsequent biological processes of the antigen processing pathway are the major limiting factors for predictive performance rather than the specific modeling approach used. While biological complexity can be tackled by more data and better models, noise and randomness inherently limit the maximum achievable predictive performance. All our datasets and experiments are available at https://github.com/ziegler-ingo/cleavage_prediction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题