论文标题

在骨质疏松症的多运动数据集中选择生物标志物的强大内核机器回归用于药物发现

A robust kernel machine regression towards biomarker selection in multi-omics datasets of osteoporosis for drug discovery

论文作者

Alam, Md Ashad, Shen, Hui, Deng, Hong-Wen

论文摘要

许多统计的机器方法最终可以通过分析多词数据来强调复杂疾病的病因的新特征。但是,当观察到的样品可能被对抗性损坏的异常值(例如虚构的数据分布)污染时,它们对分布的某些偏差很敏感。同样,统计进步在支持复杂多摩变数据集成的全面数据驱动分析方面滞后。我们提出了一种新型的非线性M估计器方法“强大的内核机回归(ROBKMR)”,以提高统计机器回归的稳健性和虚构数据的多样性,以检查多组合数据集的高阶复合效应。我们讨论了一个以内核中心的革兰氏矩阵,以准确估计模型参数。我们还提出了一项可靠的分数测试,以评估来自多摩变数据的特征的边际和联合Hadamard产品。我们将我们提出的方法应用于高加索女性的多摩斯骨质疏松症(OP)。实验表明,所提出的方法有效地确定了相互关联的风险因素。 With solid evidence (p-value = 0.00001), biological validations, network-based analysis, causal inference, and drug repurposing, the selected three triplets ((DKK1, SMTN, DRGX), (MTND5, FASTKD2, CSMD3), (MTND5, COG3, CSMD3)) are significant biomarkers and directly relate to BMD.总体而言,前三名选定的基因(DKK1,MTND5,FASTKD2)和一个基因(p值= 0.001时的SIDT1)与四种药物 - 克莫司,ibandronate,alendronate,alendronate和Bazedifene与30个候选者中的药物重新培训显着键合。此外,提出的方法可以应用于可用多摩变数据集的任何疾病模型。

Many statistical machine approaches could ultimately highlight novel features of the etiology of complex diseases by analyzing multi-omics data. However, they are sensitive to some deviations in distribution when the observed samples are potentially contaminated with adversarial corrupted outliers (e.g., a fictional data distribution). Likewise, statistical advances lag in supporting comprehensive data-driven analyses of complex multi-omics data integration. We propose a novel non-linear M-estimator-based approach, "robust kernel machine regression (RobKMR)," to improve the robustness of statistical machine regression and the diversity of fictional data to examine the higher-order composite effect of multi-omics datasets. We address a robust kernel-centered Gram matrix to estimate the model parameters accurately. We also propose a robust score test to assess the marginal and joint Hadamard product of features from multi-omics data. We apply our proposed approach to a multi-omics dataset of osteoporosis (OP) from Caucasian females. Experiments demonstrate that the proposed approach effectively identifies the inter-related risk factors of OP. With solid evidence (p-value = 0.00001), biological validations, network-based analysis, causal inference, and drug repurposing, the selected three triplets ((DKK1, SMTN, DRGX), (MTND5, FASTKD2, CSMD3), (MTND5, COG3, CSMD3)) are significant biomarkers and directly relate to BMD. Overall, the top three selected genes (DKK1, MTND5, FASTKD2) and one gene (SIDT1 at p-value= 0.001) significantly bond with four drugs- Tacrolimus, Ibandronate, Alendronate, and Bazedoxifene out of 30 candidates for drug repurposing in OP. Further, the proposed approach can be applied to any disease model where multi-omics datasets are available.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源