论文标题

基于相关的特征选择以识别蛋白质中的功能动力学

Correlation-based feature selection to identify functional dynamics in proteins

论文作者

Diez, Georg, Nagel, Daniel, Stock, Gerhard

论文摘要

为了解释生物分子系统的分子动力学模拟,通常采用了系统的降低方法。除其他外,这包括主成分分析(PCA)和时置独立组件分析(TICA),旨在分别最大化第一组件的方差和时间尺度。这种分析的关键第一步是鉴定合适的和相关的输入坐标(所谓的特征),例如骨干二面角和跨部门距离。由于通常只有一小部分这些坐标参与特定的生物分子过程,因此丢弃剩余的不相关运动或弱相关的噪声坐标很重要。这是因为它们可能表现出较大的幅度或较长的时间尺度,因此会分别被PCA和TICA认为很重要。为了区分功能动力学基础的集体运动与不相关的运动,输入坐标的相关矩阵通过聚类方法构成了块 - 二进制方法。由于假定的功能性可观察到和构象状态或变异原则,该策略避免了可能的偏见,这些原则或变异原则最大化方差或时间标准。考虑到几种线性和非线性相关措施以及各种聚类算法,这表明线性相关和莱顿社区检测算法的组合为所有考虑的模型系统均可获得出色的结果。其中包括T4溶菌酶的功能运动,以证明集体运动的成功识别以及Villin头饰的折叠,以突出相关运动的物理解释。

To interpret molecular dynamics simulations of biomolecular systems, systematic dimensionality reduction methods are commonly employed. Among others, this includes principal component analysis (PCA) and time-lagged independent component analysis (TICA), which aim to maximize the variance and the timescale of the first components, respectively. A crucial first step of such an analysis is the identification of suitable and relevant input coordinates (the so-called features), such as backbone dihedral angles and interresidue distances. As typically only a small subset of those coordinates is involved in a specific biomolecular process, it is important to discard the remaining uncorrelated motions or weakly correlated noise coordinates. This is because they may exhibit large amplitudes or long timescales and therefore will be erroneously be considered important by PCA and TICA, respectively. To discriminate collective motions underlying functional dynamics from uncorrelated motions, the correlation matrix of the input coordinates is block-diagonalized by a clustering method. This strategy avoids possible bias due to presumed functional observables and conformational states or variation principles that maximize variance or timescales. Considering several linear and nonlinear correlation measures and various clustering algorithms, it is shown that the combination of linear correlation and the Leiden community detection algorithm yields excellent results for all considered model systems. These include the functional motion of T4 lysozyme to demonstrate the successful identification of collective motion, as well as the folding of villin headpiece to highlight the physical interpretation of the correlated motions in terms of a functional mechanism.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源