论文标题

在蛋白质比对的统计模型中桥接差距

Bridging the Gaps in Statistical Models of Protein Alignment

论文作者

Sumanaweera, Dinithi, Allison, Lloyd, Konagurthu, Arun S.

论文摘要

这项工作说明了如何通过时间参数化的替代矩阵和时间参数化的三态比对机构建的完整统计模型如何量化对齐蛋白的演变。可以从对齐蛋白序列的任何基准数据集推断出这种模型的所有参数。这使我们能够在六个使用各种结构比对方法策划的基准上检查九个众所周知的替代矩阵;任何不明确模拟“时间”依赖的马尔可夫过程的矩阵都将转换为相应的基本矩阵。此外,针对六个基准中的每个基准都推断出新的最佳矩阵。使用最小消息长度(MML)推断,根据测量每个基准的香农信息内容进行比较所有15个矩阵。这导致了一个新的全面表现最佳的时间依赖时间的马尔可夫矩阵,MMLSUM及其相关的三态机,我们在这项工作中已经分析了其属性。为了标准使用,(log-odds)\ textit {评分}矩阵的MMLSUM系列可在https://lcb.infotech.monash.monash.edu.au/mmlsum上获得。

This work demonstrates how a complete statistical model quantifying the evolution of pairs of aligned proteins can be constructed from a time-parameterised substitution matrix and a time-parameterised 3-state alignment machine. All parameters of such a model can be inferred from any benchmark data-set of aligned protein sequences. This allows us to examine nine well-known substitution matrices on six benchmarks curated using various structural alignment methods; any matrix that does not explicitly model a "time"-dependent Markov process is converted to a corresponding base-matrix that does. In addition, a new optimal matrix is inferred for each of the six benchmarks. Using Minimum Message Length (MML) inference, all 15 matrices are compared in terms of measuring the Shannon information content of each benchmark. This has resulted in a new and clear overall best performed time-dependent Markov matrix, MMLSUM, and its associated 3-state machine, whose properties we have analysed in this work. For standard use, the MMLSUM series of (log-odds) \textit{scoring} matrices derived from the above Markov matrix, are available at https://lcb.infotech.monash.edu.au/mmlsum.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源