论文标题

贝叶斯链图拉索模型,学习稀疏微生物网络的预测因子

Bayesian Chain Graph LASSO Models to Learn Sparse Microbial Networks with Predictors

论文作者

Shen, Yunyi, Solis-Lemus, Claudia

论文摘要

微生物组数据需要统计模型,可以同时解码微生物对环境和微生物之间的相互作用的反应。虽然多响应线性回归模型似乎是一种直接的解决方案,但我们认为将其视为图形模型是有缺陷的,因为回归系数矩阵没有编码响应和预测源之间的条件依赖性结构,因为它不代表邻接矩阵。当我们在有条件依赖模型下才能正确编码的特定实验干预措施的边缘有先验知识,在生物环境中尤其重要。在这里,我们提出了一个带有两组节点(预测因子和响应)的链图模型,其解决方案产生了一个图形,其边缘确实表示有条件的依赖性,因此,实验者对治疗中节点的平均行为的直觉一致。我们模型的解决方案通过贝叶斯套索很少。此外,我们提出了一种自适应扩展,以便可以将不同的收缩应用于不同的边缘,以结合特定于边缘的先验知识。我们的模型通过有效的Gibbs采样算法在计算上便宜,并且可以通过适当的层次结构来解释二进制,计数和组成响应。我们将模型应用于人类肠道和土壤微生物组成数据集,我们强调说,CG-Lasso可以估计数据中的生物学意义网络结构。 CG-LASSO软件可在https://github.com/yunyishen/car-lasso上作为R软件包获得。

Microbiome data require statistical models that can simultaneously decode microbes' reaction to the environment and interactions among microbes. While a multiresponse linear regression model seems like a straight-forward solution, we argue that treating it as a graphical model is flawed given that the regression coefficient matrix does not encode the conditional dependence structure between response and predictor nodes as it does not represent the adjacency matrix. This observation is especially important in biological settings when we have prior knowledge on the edges from specific experimental interventions that can only be properly encoded under a conditional dependence model. Here, we propose a chain graph model with two sets of nodes (predictors and responses) whose solution yields a graph with edges that indeed represent conditional dependence and thus, agrees with the experimenter's intuition on the average behavior of nodes under treatment. The solution to our model is sparse via Bayesian LASSO. In addition, we propose an adaptive extension so that different shrinkage can be applied to different edges to incorporate edge-specific prior knowledge. Our model is computationally inexpensive through an efficient Gibbs sampling algorithm and can account for binary, counting and compositional responses via appropriate hierarchical structure. We apply our model to a human gut and a soil microbial compositional datasets and we highlight that CG-LASSO can estimate biologically meaningful network structures in the data. The CG-LASSO software is available as an R package at https://github.com/YunyiShen/CAR-LASSO.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源