论文标题
使用部分最小二乘的回归和分类的组成数据的主要平衡
Principal Balances of Compositional Data for Regression and Classification using Partial Least Squares
论文作者
论文摘要
高维组成数据在现代的奥米奇科学中很普遍。组成数据的分析需要正确选择正顺式坐标表示,因为它们的相对性质与直接使用标准统计方法不兼容。主余额是一类特定的对数比率坐标,非常适合这种情况,因为它们的构造方式使前几个坐标捕获了原始数据中的大部分可变性。为了关注高维度中的回归和分类问题,我们提出了一种基于新的偏最小二乘(PLS)程序,以构建主余额,以最大程度地提高响应变量的解释可变性,并且与普通PLS配方相比,可促进解释性。所提出的PLS主平衡方法可以理解为通用的常见日志对比模型,因为同时估算了多个正顺序(而不是一个)logconcontrasts。我们使用模拟和真实数据集演示了该方法的性能。
High-dimensional compositional data are commonplace in the modern omics sciences amongst others. Analysis of compositional data requires a proper choice of orthonormal coordinate representation as their relative nature is not compatible with the direct use of standard statistical methods. Principal balances, a specific class of log-ratio coordinates, are well suited to this context since they are constructed in such a way that the first few coordinates capture most of the variability in the original data. Focusing on regression and classification problems in high dimensions, we propose a novel Partial Least Squares (PLS) based procedure to construct principal balances that maximize explained variability of the response variable and notably facilitates interpretability when compared to the ordinary PLS formulation. The proposed PLS principal balance approach can be understood as a generalized version of common logcontrast models, since multiple orthonormal (instead of one) logcontrasts are estimated simultaneously. We demonstrate the performance of the method using both simulated and real data sets.