论文标题
SDGCCA:多摩斯集成的有监督的深度广义规范相关分析
SDGCCA: Supervised Deep Generalized Canonical Correlation Analysis for Multi-omics Integration
论文作者
论文摘要
多摩学数据的整合为揭示与某些表型相关的生物学机制提供了机会。我们提出了一种新型的多摩学集成方法,称为监督深度概括的规范相关分析(SDGCCA),用于建模非线性多摩学歧管之间的相关结构,旨在改善表型的分类并揭示与表型相关的生物标志物。 SDGCCA通过考虑复杂/非线性跨数据相关性和区分表型组,解决了其他规范相关分析(CCA)模型(例如,深CCA,深概括CCA)的局限性。尽管出于表型的判别目的,有几种非线性CCA投影的方法,但它们仅考虑两种观点。另一方面,SDGCCA是用于歧视的非线性多视图CCA投影方法。当我们将SDGCCA应用于阿尔茨海默氏病(AD)患者(AD)和晚期癌症的歧视时,它的表现优于其他基于CCA的方法和其他监督方法。此外,我们证明可以将SDGCCA用于特征选择,以识别重要的多态生物标志物。在广告数据应用程序中,SDGCCA鉴定了多摩变数据中的基因簇,这些基因众所周知与AD相关。
Integration of multi-omics data provides opportunities for revealing biological mechanisms related to certain phenotypes. We propose a novel method of multi-omics integration called supervised deep generalized canonical correlation analysis (SDGCCA) for modeling correlation structures between nonlinear multi-omics manifolds, aiming for improving classification of phenotypes and revealing biomarkers related to phenotypes. SDGCCA addresses the limitations of other canonical correlation analysis (CCA)-based models (e.g., deep CCA, deep generalized CCA) by considering complex/nonlinear cross-data correlations and discriminating phenotype groups. Although there are a few methods for nonlinear CCA projections for discriminant purposes of phenotypes, they only consider two views. On the other hand, SDGCCA is the nonlinear multiview CCA projection method for discrimination. When we applied SDGCCA to prediction of patients of Alzheimer's disease (AD) and discrimination of early- and late-stage cancers, it outperformed other CCA-based methods and other supervised methods. In addition, we demonstrate that SDGCCA can be used for feature selection to identify important multi-omics biomarkers. In the application on AD data, SDGCCA identified clusters of genes in multi-omics data, which are well known to be associated with AD.