论文标题

高维数据的强大结构异质性分析方法

Robust structured heterogeneity analysis approach for high-dimensional data

论文作者

Sun, Yifan, Luo, Ziye, Fan, Xinyan

论文摘要

在生物医学研究中,揭示基因与疾病表型之间的关系是一个关键问题。疾病的异质性挑战了这个问题。感知到同一疾病的患者可能形成多个亚组,并且不同的亚组具有不同的重要基因集。因此,必须发现潜在亚组并揭示亚组特异性重要基因。最近的文献已经提出了一些异质性分析方法。尽管取得了很大的成功,但大多数现有研究仍然受到限制,因为它们无法适应数据污染,而忽略了基因之间的互连。针对这些短缺,我们开发了一种健壮的结构化异质性分析方法来识别亚组,选择重要基因,并估计它们对感兴趣表型的影响。通过采用Huber损失函数来容纳可能的数据污染。在考虑基因的重叠群集结构的同时,施加了稀疏的重叠组套件惩罚来进行正则化估计和基因鉴定。这种方法以类似的K-均值聚类的精神采用迭代策略。模拟表明,所提出的方法在揭示每个亚组的异质性和选择重要基因方面的表现优于替代方案。癌细胞系百科全书数据的分析导致生物学上有意义的发现,并改善了预测和分组稳定性。

Revealing relationships between genes and disease phenotypes is a critical problem in biomedical studies. This problem has been challenged by the heterogeneity of diseases. Patients of a perceived same disease may form multiple subgroups, and different subgroups have distinct sets of important genes. It is hence imperative to discover the latent subgroups and reveal the subgroup-specific important genes. Some heterogeneity analysis methods have been proposed in recent literature. Despite considerable successes, most of the existing studies are still limited as they cannot accommodate data contamination and ignore the interconnections among genes. Aiming at these shortages, we develop a robust structured heterogeneity analysis approach to identify subgroups, select important genes as well as estimate their effects on the phenotype of interest. Possible data contamination is accommodated by employing the Huber loss function. A sparse overlapping group lasso penalty is imposed to conduct regularization estimation and gene identification, while taking into account the possibly overlapping cluster structure of genes. This approach takes an iterative strategy in the similar spirit of K-means clustering. Simulations demonstrate that the proposed approach outperforms alternatives in revealing the heterogeneity and selecting important genes for each subgroup. The analysis of Cancer Cell Line Encyclopedia data leads to biologically meaningful findings with improved prediction and grouping stability.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源