论文标题

都是相对的:微生物组组成数据的新回归范式

It's All Relative: New Regression Paradigm for Microbiome Compositional Data

论文作者

Li, Gen, Li, Yan, Chen, Kun

论文摘要

微生物组的数据本质上很复杂,涉及高维,零通货膨胀和分类层次结构。组成数据位于不接受标准欧几里得几何形状的单纯形中。大多数现有的组成回归方法都依赖于在具有过度零和分类学结构的数据建模时不足甚至不合适的转换。我们开发了一种新型的相对转移回归框架,该框架直接使用组成作为预测因子。新框架为组成回归提供了范式转变,并提供了出色的生物学解释。开发了新的Equi-Sparsity和分类指导的正则化方法以及有效的平滑近端梯度算法,以促进回归的特征聚集和尺寸降低。结果,即使在不同的分类学水平上很重要,该框架也可以自动识别临床相关的微生物。为提出的正则化估计器开发了统一的有限样本预测误差。我们证明了所提出的方法在广泛的模拟研究中的功效。对早产婴儿研究的应用揭示了肠道微生物组和神经发育之间关联的新见解。

Microbiome data are complex in nature, involving high dimensionality, compositionally, zero inflation, and taxonomic hierarchy. Compositional data reside in a simplex that does not admit the standard Euclidean geometry. Most existing compositional regression methods rely on transformations that are inadequate or even inappropriate in modeling data with excessive zeros and taxonomic structure. We develop a novel relative-shift regression framework that directly uses compositions as predictors. The new framework provides a paradigm shift for compositional regression and offers a superior biological interpretation. New equi-sparsity and taxonomy-guided regularization methods and an efficient smoothing proximal gradient algorithm are developed to facilitate feature aggregation and dimension reduction in regression. As a result, the framework can automatically identify clinically relevant microbes even if they are important at different taxonomic levels. A unified finite-sample prediction error bound is developed for the proposed regularized estimators. We demonstrate the efficacy of the proposed methods in extensive simulation studies. The application to a preterm infant study reveals novel insights of association between the gut microbiome and neurodevelopment.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源