论文标题
预处理蛋白质嵌入的贝叶斯神经网络增强了药物蛋白质相互作用的预测准确性
Bayesian neural network with pretrained protein embedding enhances prediction accuracy of drug-protein interaction
论文作者
论文摘要
药物蛋白质相互作用的表征对于药物发现的高通量筛查至关重要。基于深度学习的方法引起了人们的注意,因为它们可以预测人类反复试验的药物蛋白质相互作用。但是,由于数据标记需要大量资源,因此可用的蛋白质数据大小相对较小,因此降低了模型性能。在这里,我们提出了两种构建深度学习框架的方法,该框架具有小标记的数据集表现出卓越的性能。首先,我们使用验证的模型在编码蛋白质序列中使用转移学习,该模型以无监督的方式训练一般序列表示。其次,我们使用贝叶斯神经网络通过估计数据不确定性来制定强大的模型。结果,我们的模型的性能要比以前的基准更好,以预测药物蛋白质相互作用。我们还表明,贝叶斯推论的量化不确定性与置信度有关,可用于筛选DPI数据点。
The characterization of drug-protein interactions is crucial in the high-throughput screening for drug discovery. The deep learning-based approaches have attracted attention because they can predict drug-protein interactions without trial-and-error by humans. However, because data labeling requires significant resources, the available protein data size is relatively small, which consequently decreases model performance. Here we propose two methods to construct a deep learning framework that exhibits superior performance with a small labeled dataset. At first, we use transfer learning in encoding protein sequences with a pretrained model, which trains general sequence representations in an unsupervised manner. Second, we use a Bayesian neural network to make a robust model by estimating the data uncertainty. As a result, our model performs better than the previous baselines for predicting drug-protein interactions. We also show that the quantified uncertainty from the Bayesian inference is related to the confidence and can be used for screening DPI data points.